JackandFred t1_j0a1gv2 wrote on December 15, 2022 at 3:26 AM

If you think the real world data will be similar to your samples it's fine. But that's unlikely if you got this dataset that's so skewed. Loos up alternative metrics like F score etc. so that you can try to scale what's important metrics when training (false positive vs false negative etc.)

what you linked there is algorithms for imbalanced classification, usually the same algorithm is fine, but you want a different loss metric.