AdditionalPizza t1_itvm445 wrote on October 26, 2022 at 5:09 PM

#241,018

Here's the arxiv link for anyone interested.

Singularian2501 t1_itw560i wrote on October 26, 2022 at 7:11 PM

#243,446

https://twitter.com/MishaLaskin/status/1585265485314129926 ( Very good Explanation!)

HyperImmune t1_itwuixs wrote on October 26, 2022 at 9:53 PM

#246,403

Can some ELI5 this for me? Seems like a pretty big step to AGI, but I don’t want to get ahead of myself here.

cszintiyl t1_itx3zkd wrote on October 26, 2022 at 11:02 PM

#247,434

More than meets the eye !

visarga t1_itx6vs1 wrote on October 26, 2022 at 11:24 PM

#247,771

Replying to HyperImmune (#246,403)

They use a large context model to learn (distill) from the gameplay generated by other agents. They put more history in the context so the model needs less samples to learn.

This is significant for robots, bots and AI agents. Transformers are found to be very competent at learning to act/play/work relative to other methods, and this paper shows they can learn with less training.

AdditionalPizza t1_itx7tn0 wrote on October 26, 2022 at 11:31 PM

#247,866

Replying to visarga (#247,771)

"AD learns a more data-efficient RL algorithm than the one that generated the source data"

This part of the paper is very interesting. The transformer is able to improve upon the original RL algorithms used during pre-training.

Nmanga90 t1_itxgyuv wrote on October 27, 2022 at 12:39 AM

#248,913

Fuck transformers, all my homies hate transformers

ReverseCaptioningBot t1_itxh0o5 wrote on October 27, 2022 at 12:40 AM

#248,922

Replying to Nmanga90 (#248,913)

FUCK TRANSFORMERS ALL MY HOMIES HATE TRANSFORMERS

^^^this ^^^has ^^^been ^^^an ^^^accessibility ^^^service ^^^from ^^^your ^^^friendly ^^^neighborhood ^^^bot

Akimbo333 t1_itxr8w8 wrote on October 27, 2022 at 1:57 AM

#250,338

What are the benefits of this?

Down_The_Rabbithole t1_ityj38i wrote on October 27, 2022 at 6:33 AM

#253,568

Can we switch away from transformers already? Multiple papers have demonstrated time and time again that transformers are inefficient and doesn't scale well towards AGI. Very cool for narrow AI applications but it's not the future of AI.

AdditionalPizza t1_ityza30 wrote on October 27, 2022 at 10:21 AM

#255,087

Replying to Akimbo333 (#250,338)

By adding RL algorithms into pre-teaining, the model is able to learn new tasks without having to offline fine tune it. So it's combining reinforment learning with a transformer. And another benefit is the transformer sometimes makes more efficient RL algorithms than the originals that it was trained with.

RL is reinforment learning, a machine learning technique, which is like giving a dog a treat when it does the right trick.

It's kind of hard to explain it simply, and I'm not qualified haha. But it's a pretty big deal. It's makes it way more "out of the box" ready.

Akimbo333 t1_itzw0hb wrote on October 27, 2022 at 3:01 PM

#259,690

Replying to AdditionalPizza (#255,087)

That's awesome! Oh and I know that this might sound ignorant of me but what is a transformer?

AdditionalPizza t1_iu048nq wrote on October 27, 2022 at 3:55 PM

#260,738

Replying to Akimbo333 (#259,690)

A large language model is a transformer. An LM has tokens which are basically parts of words, like syllables and punctuation/spaces. During training it forms parameters from data. The data isn't saved, just the way it relates tokens to other tokens. If it were connect the dots, the dots are tokens and parameters are the lines. You type out a sentence, which is made of tokens and it spits out tokens. It predicts what tokens to return to you by the probability it learned of one token most likely following another. So it has reasoning based on the parameters during training, and some "policies" its given during pre-training.

I think that's a valid way to describe it in simple terms.

Akimbo333 t1_iu2d6go wrote on October 28, 2022 at 1:10 AM

#272,941

Replying to AdditionalPizza (#260,738)

Oh ok. Thanks for the info!

[DEEPMIND] Transformers have shown remarkable capabilities - but can they improve themselves autonomously from trial and error?

Comments