Viewing a single comment thread. View all comments

deliciously_methodic t1_jbtl53y wrote

What are embeddings? I watch videos, but still don’t fully understand them, then I see these pictures and I’m even more confused.

3

Simusid OP t1_jbtp8wr wrote

Given three sentences:

  • Tom went to the bank to make a payment on his mortgage.
  • Yesterday my wife went to the credit union and withdrew $500.
  • My friend was fishing along the river bank, slipped and fell in the water.

Reading those you immediately know that the first two are related because they are both about banks/money/finance. You also know that they are unrelated to the third sentence even though the first and third share the word "bank". If we had naively encoded a strictly word based model, it might incorrectly associate the first and third sentences.

What we want is a model that can represent the "semantic content" or idea behind a sentence in a way that we can make valid mathematical comparisons. We want to create a "metric space". In that space, each sentence will be represented by a vector. Then we use standard math operations to compute the distances between the vectors. In other words, the first two sentences will have vectors that point basically in the same direction, and the third vector will point in a very different direction.

The job of the language models (BERT, RoBERTa, all-mpnet-v2, etc) are to do the best job possible turning sentences into vectors. The output of these models are very high dimension, 768 dimensions and higher. We cannot visualize that, so we use tools like UMAP, tSNE, PCA, and eig to find the 2 or 3 most important components and then display them as pretty 2 or 3D point clouds.

In short, the embedding is the vector that represents the sentence in a (hopefully) valid metric space.

19

quitenominal t1_jbtqio0 wrote

Nice explainer! I think this is good for those with some linear algebra familiarity. I added a further explanation going one level more simple again

2

utopiah t1_jbtx8iv wrote

> What we want is a model that can represent the "semantic content" or idea behind a sentence

We do but is it what embedding actually provide or rather some kind of distance between items, how they might relate or not between each other? I'm not sure that would be sufficient for most people to provide the "idea" behind a sentence, just relatedness. I'm not saying it's not useful but arguing against the semantic aspect here, at least from my understanding of that explanation.

2

Simusid OP t1_jbu0bkv wrote

>We do but is it what embedding actually provide or rather some kind of distance between items,

A single embedding is a single vector, encoding a single sentence. To identify a relationship between sentences, you need to compare vectors. Typically this is done with cosine distance between the vectors. The expectation is that if you have a collection of sentences that all talk about cats, the vectors that represent them will exist in a related neighborhood in the metric space.

2

utopiah t1_jbu0qpa wrote

Still says absolutely nothing if you don't know what a cat is.

−2

Simusid OP t1_jbu2n5w wrote

That was not the point at all.

Continuing the cat analogy, I have two different cameras. I take 20,000 pictures of the same cats with both. I have two datasets of 20,000 cats. Is one dataset superior to the other? I will build a model that tries to predict cats and see if the "quality" of one dataset is better than the other.

In this case, the OpenAI dataset appears to be slightly better.

5

deliciously_methodic t1_jcifdxa wrote

Thanks very informative. Can we dumb this down further? What would a 3 dimensional embedding table look like for the following sentences? And how do we go from words to numbers, what is the algorithm?

  1. Bank deposit.
  2. Bank withdrawal.
  3. River bank.
2

Simusid OP t1_jciguq5 wrote

"words to numbers" is the secret sauce of all the models including the new GPT-4. Individual words are tokenized (sometimes into "word pieces") and a mapping from the tokens to numbers via a vocabulary is made. Then the model is trained on pairs of sentences A and B. Sometimes the model is shown a pair where B correctly follows A, and sometimes not. Eventually the model learns to predict what is most likely to come next.

"he went to the bank", "he made a deposit"

B probably follows A

"he went to the bank", "he bought a duck"

Does not.

That is one type of training to learn valid/invalid text. Another is "leave one out" training. In this case the input is a full sentence minus one word (typically).

"he went to the convenience store and bought a gallon of _____"

and the model should learn that the most common answer will probably be "milk"

​

Back to your first question. In 3D your first two embeddings should be closer together because they are similar. And they should be both "far' from the third encoding.

1

quitenominal t1_jbtptri wrote

An embedding is a numerical representation of some data. In this case the data is text.

These representations (read list of numbers) can be learned with some goal in mind. Usually you want the embeddings of similar data to be close to one another, and the embeddings of disparate data to be far.

Often these lists of numbers representing the data are very long - I think the ones from the model above are 768 numbers. So each piece of text is transformed into a list of 768 numbers, and similar text will get similar lists of numbers.

What's being visualized above is a 2 number summary of those 768. This is referred to as a projection, like how a 3D wireframe casts a 2D shadow. This lets us visualize the embeddings and can give a qualitative assessment of their 'goodness' - a.k.a are they grouping things as I expect? (Similar texts are close, disparate texts are far)

4

wikipedia_answer_bot t1_jbtl62p wrote

**In mathematics, an embedding (or imbedding) is one instance of some mathematical structure contained within another instance, such as a group that is a subgroup. When some object

    X
  

{\displaystyle X}

is said to be embedded in another object

    Y
  

{\displaystyle Y}

, the embedding is given by some injective and structure-preserving map

    f
    :
    X
    →
    Y
  

{\displaystyle f:X\rightarrow Y}

.**

More details here: <https://en.wikipedia.org/wiki/Embedding>

This comment was left automatically (by a bot). If I don't get this right, don't get mad at me, I'm still learning!

^(opt out) ^(|) ^(delete) ^(|) ^(report/suggest) ^(|) ^(GitHub)

−1