bivouac0

bivouac0 t1_jbjk79f wrote

Truthfully, this has not been sufficiently researched and looking into this might yield improvements to LLMs. However it's also not completely surprising. Consider...

For Humans, something like 80% of a conversation is non-verbal (there are actual studies on this). This means that people get the meaning of words through other clues such as expression, tone, etc.. and thus our conversational inputs are much "richer" than simply a bunch of tokens.

You also need to consider that our verbal communication is augmented by a lot of other sensory input (ie.. visual). You learn what a "ball" is largely by seeing it, not hearing about it.

Also realize that LLMs generally use a very low learning rate (ie.. 1e-3) so a large number of tokens must be presented. It's not completely clear with people how this works but we do completely memorize some inputs (ie.. LR=1) and almost completely ignore others. This in itself could be an entire area of research. It would be good to understand why some phrases are "catchy" and others are forgettable. Obviously, AI today doesn't do this.

I'd also point out that LLMs are not exactly memorizing information. Studies have demonstrated their ability to learn facts but this is not purposeful knowledge retention. People have a better ability to do this and I suspect AI needs to develop a method to separate knowledge retention and language pattern modeling. Think about learning the state capitals. A person quickly learns to say "the capital of X is Y" and then can substitute in different memorized facts. AI learns the facts and the sentence patterns all in the same manner.

People can also use "thought" (ie.. search, hypothesis, etc..) to understand the meaning of sentences and form responses. Let's face it, at this point LLMs are just a brute force pattern matchers. There's nothing "intelligent" here.

8

bivouac0 t1_itis6yy wrote

There's a Github project that an individual put together based on the RETRO paper. If you checkout the issues list there is some info on work on a pretrained model.

There is also the Huggingface RAG Model and Facebook has a couple of models on the HF hub.

Note that the RAG model is an an older approach to retrieval so you probably want to be looking at the RETRO project above.

2

bivouac0 t1_iqov6wt wrote

I bought an MSI gaming laptop some years back (~$1500) and it worked good for basic stuff while I was learning Neural Nets. MSI was about the best price-wise and there was a slot for a 2nd SSD so it was easy to put Ubuntu on a 2nd disk and keep Windows as-is for other work.

I have to agree with other comments however. The laptop will run very hot under heavy training and in my experience, the mobile versions of the GPUs are not nearly as fast as the similarly named desktop versions but cost considerably more.

BTW... If you're just doing basic NN training for school you probably don't need a 14-core laptop and 64GB of RAM. You can probably get by with something smaller, maybe a 3070 or 3080 (no TI) and then rely on other resources like Colab for training larger more complex nets.

1