Submitted by Simusid t3_11okrni in MachineLearning
quitenominal t1_jbtptri wrote
Reply to comment by deliciously_methodic in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid
An embedding is a numerical representation of some data. In this case the data is text.
These representations (read list of numbers) can be learned with some goal in mind. Usually you want the embeddings of similar data to be close to one another, and the embeddings of disparate data to be far.
Often these lists of numbers representing the data are very long - I think the ones from the model above are 768 numbers. So each piece of text is transformed into a list of 768 numbers, and similar text will get similar lists of numbers.
What's being visualized above is a 2 number summary of those 768. This is referred to as a projection, like how a 3D wireframe casts a 2D shadow. This lets us visualize the embeddings and can give a qualitative assessment of their 'goodness' - a.k.a are they grouping things as I expect? (Similar texts are close, disparate texts are far)
Viewing a single comment thread. View all comments