Viewing a single comment thread. View all comments

Tober447 t1_j7uq41s wrote

You would take the output of a layer of your choice from the trained cnn (as you do now) and feed it into a new model, that is the autoencoder. So yes, the weights from your model are kept, but you will have to train the autoencoder from scratch. Something like CNN (only inference, no backprop) --> Decoder --> Latent Space --> Encoder for training and during inference you take the output of the decoder and use it for visualization or similarity.

4

zanzagaes2 OP t1_j7uual3 wrote

Yes, that's a great idea. I guess I can use the encoder-decoder to create a very low-dimensional embedding and use the current one (~500 features) to find similar images to a given one, right?

Your perspective has been really helpful, thank you

2

schludy t1_j7v9pkm wrote

I think you're underestimating the curse of dimensionality. In 500d, most vectors will be far away from each other. You can't just use L2 norm when comparing the vectors in that high dimensional space

2

zanzagaes2 OP t1_j7vpd89 wrote

Yes, I think that's the case because I am getting far more reasonable values comparing the projection to 2d/3d of the embedding rather than the full 500 feature vector.

Is there a better way to do this than projecting into a smaller space (using reduction dimensionality techniques or encoder-decoder approach) and using L2 there?

1

Tober447 t1_j7uyy1n wrote

>I guess I can use the encoder-decoder to create a very low-dimensional embedding and use the current one (~500 features) to find similar images to a given one, right?

Exactly. :-)

1