Viewing a single comment thread. View all comments

Simusid OP t1_jbu0bkv wrote

>We do but is it what embedding actually provide or rather some kind of distance between items,

A single embedding is a single vector, encoding a single sentence. To identify a relationship between sentences, you need to compare vectors. Typically this is done with cosine distance between the vectors. The expectation is that if you have a collection of sentences that all talk about cats, the vectors that represent them will exist in a related neighborhood in the metric space.

2

utopiah t1_jbu0qpa wrote

Still says absolutely nothing if you don't know what a cat is.

−2

Simusid OP t1_jbu2n5w wrote

That was not the point at all.

Continuing the cat analogy, I have two different cameras. I take 20,000 pictures of the same cats with both. I have two datasets of 20,000 cats. Is one dataset superior to the other? I will build a model that tries to predict cats and see if the "quality" of one dataset is better than the other.

In this case, the OpenAI dataset appears to be slightly better.

5