Tober447 t1_j7u90qp wrote on February 9, 2023 at 1:24 PM

You could try an autoencoder with CNN layers and a bottleneck of 2 or 3 neurons to be able to visualize these embeddings. The autoencoder can be interpreted as non-linear PCA.

Also, similarity in this embedding space should correlate with similarity of the real images/whatever your CNN extracts from the real images.

zanzagaes2 OP t1_j7unuq2 wrote on February 9, 2023 at 3:16 PM

May I use some part of the trained model to avoid retraining from scratch? The current model has very decent precision and I have generated some other visualizations for it (like heatmaps) so doing work around this model would be very convenient.

Edit: I have added an image of the best embedding I have found until now as a reference

Tober447 t1_j7uq41s wrote on February 9, 2023 at 3:31 PM

You would take the output of a layer of your choice from the trained cnn (as you do now) and feed it into a new model, that is the autoencoder. So yes, the weights from your model are kept, but you will have to train the autoencoder from scratch. Something like CNN (only inference, no backprop) --> Decoder --> Latent Space --> Encoder for training and during inference you take the output of the decoder and use it for visualization or similarity.

zanzagaes2 OP t1_j7uual3 wrote on February 9, 2023 at 3:58 PM

Yes, that's a great idea. I guess I can use the encoder-decoder to create a very low-dimensional embedding and use the current one (~500 features) to find similar images to a given one, right?

Your perspective has been really helpful, thank you

schludy t1_j7v9pkm wrote on February 9, 2023 at 5:36 PM

I think you're underestimating the curse of dimensionality. In 500d, most vectors will be far away from each other. You can't just use L2 norm when comparing the vectors in that high dimensional space

zanzagaes2 OP t1_j7vpd89 wrote on February 9, 2023 at 7:13 PM

Yes, I think that's the case because I am getting far more reasonable values comparing the projection to 2d/3d of the embedding rather than the full 500 feature vector.

Is there a better way to do this than projecting into a smaller space (using reduction dimensionality techniques or encoder-decoder approach) and using L2 there?

Tober447 t1_j7uyy1n wrote on February 9, 2023 at 4:28 PM

>I guess I can use the encoder-decoder to create a very low-dimensional embedding and use the current one (~500 features) to find similar images to a given one, right?

Exactly. :-)