LuckyLuke87b

LuckyLuke87b t1_it19qkj wrote

Have you tried to generate samples by sampling from your latent space prior and feeding it to the decoder? In my experience it is often necessary to tune the weight of the KL-Loss such, that the decoder is a proper generator. Once this is done, some of the latent representations from the decoder get very close to the prior distributions, while other represent the relevant information. Next step is, to compare, if these relevant latent dimensions are the same on various encoded samples. Finally, prune all dimensions, which basically never differ from the prior up to some tolerance.

1

LuckyLuke87b t1_iswyslx wrote

I fully agree with your idea and observed similar behavior. I'm not aware of literature regarding VAE, but I believe that there was quite some fundamental work beffore deep learning on pruning bayesian neural network weights based on the posterior entropy or "information length". Similalry I would consider this latent dimension selection as a way of pruning, based on how much information is represented.

1