Viewing a single comment thread. View all comments

edjez t1_j7t9rp3 wrote

Another emergent capability - and this depends on the model architecture, for example I don’t think Stable Diffusion could have it, but Dalle does - is to generate written letters / “captions” that to us look like gibberish but actually correspond to internal language embeddings for real-world cluster of concepts.

2

DigThatData t1_j7tb03a wrote

i'm not sure that's an emergent ability so much as it is explicitly what the model is being trained to learn. it's not surprising to me that there is a "painting signature" concept it has learned and samples from when it generates gibberish of a particular length and size in the bottom right corner (for example). that sounds like one of the easier "concepts" it would have learned.

11

master3243 t1_j7tmpsz wrote

Exactly, the beginning "Clip" part of the entire Dalle model is trained to take any english text and map it to an embedding space.

It's completely natural (and probably surprising if it doesn't happen) that Clip would map (some) gibberish words to a part of the embedding space that is sufficiently close in L2-distance to the projection of a real world.

In that case, the diffusion model would decode that gibberish word to a similar image generated by the real word.

2

amnezzia t1_j7tav4g wrote

You mean it takes a mean vector of a cluster and makes up a word for it?

1

Mescallan t1_j7tblf5 wrote

word might not be correct, as it implies a consistent alphabet, but semantics aside, yes I believe that is what is happening

1

xenophobe3691 t1_j7xm8wm wrote

Sounds like that story of the guy from 40k who pretty much looked for the underlying connections between all the different kinds of beauty and joy. He found “It” alright…

1