Viewing a single comment thread. View all comments

quitenominal t1_jbtptri wrote

An embedding is a numerical representation of some data. In this case the data is text.

These representations (read list of numbers) can be learned with some goal in mind. Usually you want the embeddings of similar data to be close to one another, and the embeddings of disparate data to be far.

Often these lists of numbers representing the data are very long - I think the ones from the model above are 768 numbers. So each piece of text is transformed into a list of 768 numbers, and similar text will get similar lists of numbers.

What's being visualized above is a 2 number summary of those 768. This is referred to as a projection, like how a 3D wireframe casts a 2D shadow. This lets us visualize the embeddings and can give a qualitative assessment of their 'goodness' - a.k.a are they grouping things as I expect? (Similar texts are close, disparate texts are far)

4