Viewing a single comment thread. View all comments

bimtuckboo t1_j0b1lti wrote

The issue described in the article you linked only becomes relevant when you are throwing away data (that you otherwise would have trained on) purely to rectify class imbalance. If you can't train on it anyway due to computational limitations, even if the classes were 50/50 balanced, then there is nothing else to be done.

Of course more data can often lead to better performance and if you find your model to be below par then you may want to explore ways to engineer around whatever computational limitations you are encountering so that you can train on more data. In that case you may want to revisit your approach to rectiifying the class imbalance but don't do it if you don't need to.

Ultimately, anytime you are developing a model and you don't know what to do next, check if the model's performance is acceptable as is. You might not need to do anything.

1