Akimbo333 t1_itxr8w8 wrote

What are the benefits of this?


AdditionalPizza t1_ityza30 wrote

By adding RL algorithms into pre-teaining, the model is able to learn new tasks without having to offline fine tune it. So it's combining reinforment learning with a transformer. And another benefit is the transformer sometimes makes more efficient RL algorithms than the originals that it was trained with.

RL is reinforment learning, a machine learning technique, which is like giving a dog a treat when it does the right trick.

It's kind of hard to explain it simply, and I'm not qualified haha. But it's a pretty big deal. It's makes it way more "out of the box" ready.


Akimbo333 t1_itzw0hb wrote

That's awesome! Oh and I know that this might sound ignorant of me but what is a transformer?


AdditionalPizza t1_iu048nq wrote

A large language model is a transformer. An LM has tokens which are basically parts of words, like syllables and punctuation/spaces. During training it forms parameters from data. The data isn't saved, just the way it relates tokens to other tokens. If it were connect the dots, the dots are tokens and parameters are the lines. You type out a sentence, which is made of tokens and it spits out tokens. It predicts what tokens to return to you by the probability it learned of one token most likely following another. So it has reasoning based on the parameters during training, and some "policies" its given during pre-training.

I think that's a valid way to describe it in simple terms.