Submitted by Dear-Vehicle-3215 t3_yk17qn in MachineLearning

Hello Guys,

In this period I am working on a dimensionality reduction task using a Convolutional Autoencoder. I am working on a 3d dataset, thus the model is a 3D AE (with attention).

Regarding the MSE metric, it seems working well (it can reconstruct the input quite well, even if I am trying to switch to a denoising task), but I would like to understand if the features extracted are somehow meaningful. I know that it depends on the downstream task, but I read in this paper about Contractive AE (https://icml.cc/2011/papers/455_icmlpaper.pdf) that the Frobenius norm of the jacobian matrix of the encoder is strongly correlated with the test error of the downstream task.

The problem is that I am having a hard time implementing this metric since I am not using an MLP autoencoder and because I am not using the sigmoid nonlinearity.

  • Is it by chance that nobody talks about Contractive Autoencoder in Convolutional AE?
  • Do you have any general advice about my objective (evaluating the quality of my features)?

Thank you very much in advance

1

Comments

You must log in or register to comment.

agent229 t1_iur2oqk wrote

You should be able to have autograd calculate the jacobian for you in torch or tensorflow. Another thing I’ve done is a Monte Carlo version (sample near the encoding of a data point, propagate through decoder, inspect changes to output). Perhaps it would be useful to use tSNE to view the embeddings in 2D…

2

eyeswideshhh t1_iurafh8 wrote

Denoising/vanilla autoencoder does not impose any constraints on latent representation of encoder and thus may have highly entangled fearures , you can verify this with clustering.

2

Dear-Vehicle-3215 OP t1_iuriai1 wrote

I will try to check this, I didn't think about it.

The only way Clustering came to my mind was for extracting some clusters and then using them as a label to evaluate my features.

I am trying also to implement a penalization in the latent representation by applying the norm of the jacobian matrix, but since I see no one using it, I was thinking that was wrong in Convolutional AE (and also the paper use a different definition from the one that I am using).

1

MLisdabomb t1_iurku96 wrote

For vis in 2d try umap, it is quite a bit faster than tsne

1

i-heart-turtles t1_iusf0zy wrote

Generally I think there should be more efficient ways of doing what you want without having to compute the full Jacobian- people do similar things in adversarial robustness so you can have a look.

https://arxiv.org/abs/1907.02610

https://arxiv.org/abs/1901.08573

I think you should check the stuff on evaluating for disentanglement. This paper could also be useful for u: https://arxiv.org/abs/1812.06775. For vae disentanglement better Jacobian is close to orthogonal than just small norm.

1

Dear-Vehicle-3215 OP t1_iuvdyog wrote

Thank you very much. It seems that VAE could be a nice choice for me.

Anyway, by plotting the cluster map it seems that there are several features higly correlated, but also a lots of feature with low correlation

1