Submitted by Simusid t3_11okrni in MachineLearning
Simusid OP t1_jbu0bkv wrote
Reply to comment by utopiah in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid
>We do but is it what embedding actually provide or rather some kind of distance between items,
A single embedding is a single vector, encoding a single sentence. To identify a relationship between sentences, you need to compare vectors. Typically this is done with cosine distance between the vectors. The expectation is that if you have a collection of sentences that all talk about cats, the vectors that represent them will exist in a related neighborhood in the metric space.
utopiah t1_jbu0qpa wrote
Still says absolutely nothing if you don't know what a cat is.
Simusid OP t1_jbu2n5w wrote
That was not the point at all.
Continuing the cat analogy, I have two different cameras. I take 20,000 pictures of the same cats with both. I have two datasets of 20,000 cats. Is one dataset superior to the other? I will build a model that tries to predict cats and see if the "quality" of one dataset is better than the other.
In this case, the OpenAI dataset appears to be slightly better.
Viewing a single comment thread. View all comments