jackilion

jackilion t1_j63e6ah wrote

There is no reason to assume your latent space will be smooth by itself. I remember a paper for image generation that had techniques for smoothing out the latent space that can be applied during training:

https://arxiv.org/abs/2106.09016

​

It's about GANs, not autoencoders, but maybe you can find some ideas in there.

1

jackilion t1_j5zk1sb wrote

I'm not working on NLP but I have seen your idea in papers on diffusion models. You are basically linearly interpolating your latent space. There are other interpolation techniques you could try, but your idea will definitely give you some insight into your latent space.

Another possibiltiy would be some kind of grid search through the latent space, tho depending on your dimensions it could be too hard.

Lastly, you could visualize the latent space by projecting it into 2 or 3 dimensions via t-SNE or something similar.

3

jackilion t1_j2mhdf6 wrote

You can't teach language to AlphaGo.

AlphaX is an architecture that is made to quickly traverse a huge space of possibilities. That's why it's good at games like chess and Go, where the AI has to think ahead of what the game state could be N moves down the line, each move exponentially increasing the amount of game states. Same for AlphaFold and protein folding.

GPT is a transformer, which gets an input vector, possibly, but not necessarily representing language, and produces an output vector. Through self attention it is able to weigh certain parts of the vector on it's own, similar to how humans weigh certain words in a sentence differently.

StableDiffusion is a Denoising Diffusion Model, a model that takes a 2D tensor as input (possibly representing an Image) and producing a 2D tensor as output. It's used to learn to reverse some noise algorithm that has been applied to the dataset.

You see, each of these architectures have a very specific form of input and output, and their structure enables them to perform a certain task very well. You can't "teach" ChatGPT to produce an image, because it doesn't have a way to process image data.

3