schludy t1_j7ula73 wrote on February 9, 2023 at 2:58 PM

How do you plot the embeddings in 2D exactly? What is the size of the embeddings that you're trying to visualize?

zanzagaes2 OP t1_j7unjuw wrote on February 9, 2023 at 3:14 PM

I have not found a very convincing embedding yet, I have tried several that go from ~500 features (class activation map) to ~20.000 features (output of last convolutional layer before pooling), all generated from the full training set (~30.000 samples)

In all cases I do the same, I use PCA to reduce vectors to 1.000 features and UMAP or t-SNE (usually try both) to get a 2d vector I can scatter plot. I have tried to use UMAP for the full process but it doesn't escalate well enough. Is this a good approach?

Edit: I have added an image of the best embedding I have found until now as a reference

schludy t1_j7v11vj wrote on February 9, 2023 at 4:42 PM

The individual steps sound ok, however, if you project 20.000 to 2D, the results you got look very reasonable. I'm not sure about UMAP, but I think for tSNE, it's recommended to have low dimensionality, something more in the order of 32 features. I would probably try to adjust the architecture, as other comments have suggested

zanzagaes2 OP t1_j7vpols wrote on February 9, 2023 at 7:15 PM

You are right, both tSNE and UMAP documentation recommend going to 30-50 features before using them. In this case the result is quite similar to the one I found, though.