Submitted by These-Assignment-936 t3_10xjwac in MachineLearning
master3243 t1_j7tmpsz wrote
Reply to comment by DigThatData in [D] Are there emergent abilities of image models? by These-Assignment-936
Exactly, the beginning "Clip" part of the entire Dalle model is trained to take any english text and map it to an embedding space.
It's completely natural (and probably surprising if it doesn't happen) that Clip would map (some) gibberish words to a part of the embedding space that is sufficiently close in L2-distance to the projection of a real world.
In that case, the diffusion model would decode that gibberish word to a similar image generated by the real word.
Viewing a single comment thread. View all comments