Viewing a single comment thread. View all comments

skelly0311 t1_j0adie9 wrote

What algorithm are you using? If it learns in an iterative fashion, such as gradient descent, you can downsample a different random sample of the class that has more training examples every epoch of feed forward/backprop, thus not losing any information from the class that has more data.

I currently do this with multi label classification problems in NLP, where the classes are much more skewed than your use case.

3

hopedallas OP t1_j0c0eui wrote

Im using both random forest and xgboost. For your NLP problem, you give higher weighs for each epoch to the sparse classes?

1