Hi all: I have trained a CNN (efficietnet-b3) to classify the degree of a disease on medical images. I would like to create an embedding both to visualize relationships between images (after projecting to 2d or 3d-space) and to find similar images to one given.

I have tried using the output of the last convolution both before and after pooling for all train images (~30.000) but the result is mediocre: images non-alike are quite close in the embedding and plotting it in 2 or 3d just show a point cloud with no obvious pattern.

I have also tried to use the class activation map (the output of the convolutional layer after pooling and multiplying by the weights of the classifier of the predicted class). This is quite better, but class are not separated too clearly in the scatterplot.

Is there any other sensible way to generate the embeddings? I have tried using the hidden representation of earlier convolutional layers, but some of them are so huge (~650.000 features per sample) creating a reasonable sized embedding would require very aggressive PCA.

Example of the scatter plot of the heatmap embedding. While it is okayish (classes are more or less spatially localized) it would be great to find an embedding that creates more visible clusters for each class.

https://preview.redd.it/l7smdyuml6ha1.png?width=543&format=png&auto=webp&v=enabled&s=1c9a872ff73eea199e4977a1375303bcffe00158

Comments

You must log in or register to comment.

Tober447 t1_j7u90qp wrote on February 9, 2023 at 1:24 PM

You could try an autoencoder with CNN layers and a bottleneck of 2 or 3 neurons to be able to visualize these embeddings. The autoencoder can be interpreted as non-linear PCA.

Also, similarity in this embedding space should correlate with similarity of the real images/whatever your CNN extracts from the real images.

zanzagaes2 OP t1_j7unuq2 wrote on February 9, 2023 at 3:16 PM

May I use some part of the trained model to avoid retraining from scratch? The current model has very decent precision and I have generated some other visualizations for it (like heatmaps) so doing work around this model would be very convenient.

Edit: I have added an image of the best embedding I have found until now as a reference

Tober447 t1_j7uq41s wrote on February 9, 2023 at 3:31 PM

You would take the output of a layer of your choice from the trained cnn (as you do now) and feed it into a new model, that is the autoencoder. So yes, the weights from your model are kept, but you will have to train the autoencoder from scratch. Something like CNN (only inference, no backprop) --> Decoder --> Latent Space --> Encoder for training and during inference you take the output of the decoder and use it for visualization or similarity.

zanzagaes2 OP t1_j7uual3 wrote on February 9, 2023 at 3:58 PM

Yes, that's a great idea. I guess I can use the encoder-decoder to create a very low-dimensional embedding and use the current one (~500 features) to find similar images to a given one, right?

Your perspective has been really helpful, thank you

schludy t1_j7v9pkm wrote on February 9, 2023 at 5:36 PM

I think you're underestimating the curse of dimensionality. In 500d, most vectors will be far away from each other. You can't just use L2 norm when comparing the vectors in that high dimensional space

zanzagaes2 OP t1_j7vpd89 wrote on February 9, 2023 at 7:13 PM

Yes, I think that's the case because I am getting far more reasonable values comparing the projection to 2d/3d of the embedding rather than the full 500 feature vector.

Is there a better way to do this than projecting into a smaller space (using reduction dimensionality techniques or encoder-decoder approach) and using L2 there?

Tober447 t1_j7uyy1n wrote on February 9, 2023 at 4:28 PM

>I guess I can use the encoder-decoder to create a very low-dimensional embedding and use the current one (~500 features) to find similar images to a given one, right?

Exactly. :-)

mrtransisteur t1_j7xt1e5 wrote on February 10, 2023 at 3:58 AM

You want to model:

p(cluster =c | img)

p(c1 == c2 | dist(c1, c2) = d, img1 in c1, img2 in c2)

You could try a couple things:

Frechet Inception Distance but instead of Inception model you use the medical CNN activations
distance metric learning
hdbscan/umap/etc for clustering
persistent homology based topological data analysis methods for finding clusters
masked autoencoders for good feature extraction
JEPA style architecture

schludy t1_j7ula73 wrote on February 9, 2023 at 2:58 PM

How do you plot the embeddings in 2D exactly? What is the size of the embeddings that you're trying to visualize?

zanzagaes2 OP t1_j7unjuw wrote on February 9, 2023 at 3:14 PM

I have not found a very convincing embedding yet, I have tried several that go from ~500 features (class activation map) to ~20.000 features (output of last convolutional layer before pooling), all generated from the full training set (~30.000 samples)

In all cases I do the same, I use PCA to reduce vectors to 1.000 features and UMAP or t-SNE (usually try both) to get a 2d vector I can scatter plot. I have tried to use UMAP for the full process but it doesn't escalate well enough. Is this a good approach?

Edit: I have added an image of the best embedding I have found until now as a reference

schludy t1_j7v11vj wrote on February 9, 2023 at 4:42 PM

The individual steps sound ok, however, if you project 20.000 to 2D, the results you got look very reasonable. I'm not sure about UMAP, but I think for tSNE, it's recommended to have low dimensionality, something more in the order of 32 features. I would probably try to adjust the architecture, as other comments have suggested

zanzagaes2 OP t1_j7vpols wrote on February 9, 2023 at 7:15 PM

You are right, both tSNE and UMAP documentation recommend going to 30-50 features before using them. In this case the result is quite similar to the one I found, though.

lonelyrascal t1_j7vwy0c wrote on February 9, 2023 at 7:59 PM

PCA has O(n^3) time complexity. Instead of doing that, why don't you pass the embedding through an autoencoder?

zanzagaes2 OP t1_j7w5sr1 wrote on February 9, 2023 at 8:52 PM

I will try encoder-decoder architecture, mainly to try to improve the embedding. Right now asymptotics of PCA have not proven a problem, sklearn implementation performs PCA on ~1.000 features vectors almost immediately.

Do you have any reference on any encoder-decoder architecture I can use?

lonelyrascal t1_j7wp5yv wrote on February 9, 2023 at 10:57 PM

Ok cool. Yeah keras has basic encoder decoder architecture in its documentation. If that's not something you like, you can always ask chatGPT ;)