Comments

You must log in or register to comment.

HyperImmune t1_itwuixs wrote

Can some ELI5 this for me? Seems like a pretty big step to AGI, but I don’t want to get ahead of myself here.

5

cszintiyl t1_itx3zkd wrote

More than meets the eye !

8

visarga t1_itx6vs1 wrote

They use a large context model to learn (distill) from the gameplay generated by other agents. They put more history in the context so the model needs less samples to learn.

This is significant for robots, bots and AI agents. Transformers are found to be very competent at learning to act/play/work relative to other methods, and this paper shows they can learn with less training.

9

AdditionalPizza t1_itx7tn0 wrote

"AD learns a more data-efficient RL algorithm than the one that generated the source data"

This part of the paper is very interesting. The transformer is able to improve upon the original RL algorithms used during pre-training.

6

Nmanga90 t1_itxgyuv wrote

Fuck transformers, all my homies hate transformers

3

Akimbo333 t1_itxr8w8 wrote

What are the benefits of this?

1

Down_The_Rabbithole t1_ityj38i wrote

Can we switch away from transformers already? Multiple papers have demonstrated time and time again that transformers are inefficient and doesn't scale well towards AGI. Very cool for narrow AI applications but it's not the future of AI.

0

AdditionalPizza t1_ityza30 wrote

By adding RL algorithms into pre-teaining, the model is able to learn new tasks without having to offline fine tune it. So it's combining reinforment learning with a transformer. And another benefit is the transformer sometimes makes more efficient RL algorithms than the originals that it was trained with.

RL is reinforment learning, a machine learning technique, which is like giving a dog a treat when it does the right trick.

It's kind of hard to explain it simply, and I'm not qualified haha. But it's a pretty big deal. It's makes it way more "out of the box" ready.

3

AdditionalPizza t1_iu048nq wrote

A large language model is a transformer. An LM has tokens which are basically parts of words, like syllables and punctuation/spaces. During training it forms parameters from data. The data isn't saved, just the way it relates tokens to other tokens. If it were connect the dots, the dots are tokens and parameters are the lines. You type out a sentence, which is made of tokens and it spits out tokens. It predicts what tokens to return to you by the probability it learned of one token most likely following another. So it has reasoning based on the parameters during training, and some "policies" its given during pre-training.

I think that's a valid way to describe it in simple terms.

2