biophysninja t1_j0a4nc2 wrote on December 15, 2022 at 3:52 AM

There are a few ways to approach this depending on the nature of the data, complexity, and compute available.

2- if your data is sparse you can use PCA or Autoencoders to reduce the dimensionality. Then follow up with SMOTE.

3- Using GANs to generate negatives samples is another alternative.

Far-Butterscotch-436 t1_j0a8ny2 wrote on December 15, 2022 at 4:26 AM

Regarding 2, there are only 500 features, dimension reduction not needed.

1 and 3 are last resorts

Has anyone ever seen SMOTE give good results in real world data??
Depends what the 500 features are, you could very well benefit from dimension reduction, or at least pruning some features, if they are not all equally useful. That is a separate topic though
Lot of work to create fake data when he already has that amount

Playing with the loss functions/metrics is probably the best way to go as you ( u/Far-Butterscotch-436 ) pointed out.