Viewing a single comment thread. View all comments

bivouac0 t1_jbjk79f wrote

Truthfully, this has not been sufficiently researched and looking into this might yield improvements to LLMs. However it's also not completely surprising. Consider...

For Humans, something like 80% of a conversation is non-verbal (there are actual studies on this). This means that people get the meaning of words through other clues such as expression, tone, etc.. and thus our conversational inputs are much "richer" than simply a bunch of tokens.

You also need to consider that our verbal communication is augmented by a lot of other sensory input (ie.. visual). You learn what a "ball" is largely by seeing it, not hearing about it.

Also realize that LLMs generally use a very low learning rate (ie.. 1e-3) so a large number of tokens must be presented. It's not completely clear with people how this works but we do completely memorize some inputs (ie.. LR=1) and almost completely ignore others. This in itself could be an entire area of research. It would be good to understand why some phrases are "catchy" and others are forgettable. Obviously, AI today doesn't do this.

I'd also point out that LLMs are not exactly memorizing information. Studies have demonstrated their ability to learn facts but this is not purposeful knowledge retention. People have a better ability to do this and I suspect AI needs to develop a method to separate knowledge retention and language pattern modeling. Think about learning the state capitals. A person quickly learns to say "the capital of X is Y" and then can substitute in different memorized facts. AI learns the facts and the sentence patterns all in the same manner.

People can also use "thought" (ie.. search, hypothesis, etc..) to understand the meaning of sentences and form responses. Let's face it, at this point LLMs are just a brute force pattern matchers. There's nothing "intelligent" here.

8

endless_sea_of_stars t1_jbmda5p wrote

> develop a method to separate knowledge retention and language pattern modeling. Think about learning the state capitals. A person quickly learns to say "the capital of X is Y" and then can substitute in different memorized facts. AI learns the facts and the sentence patterns all in the same manner.

This sounds like a problem Toolformer is supposed to address. Instead of learning all the state capitals learn to call. "The capital of Indiana is [QA(Indiana, capital)]."

1