Viewing a single comment thread. View all comments

Far-Butterscotch-436 t1_j0a9083 wrote

5% imbalance isn't bad. Just use a cost function that uses a metric to handle imbalance. Ie, the weighted average binomial deviance and you'll be fine.

Also you can create downsampling ensemble to compare performance and compare. Don't downsample to 50/50, try for at least 10%

You've got a good problem, lots of observations with few features

20

trendymoniker t1_j0acn6e wrote

👆

1e6:1 is extreme. 1e3:1 is often realistic (think views to shares on social media). 18:1 is a actually a pretty good real world ratio.

If it were me, I’d just change the weights for each class in the loss function to get them more or less equal.

190m examples isn’t that many either — don’t worry about it. Compute is cheap — it’s ok if it takes more than one machine and/or more time.

9

hopedallas OP t1_j0ailbn wrote

Thanks for the hint. Sorry not sure what you mean by “try for 10%”?

2