Viewing a single comment thread. View all comments

visarga t1_itx6vs1 wrote

They use a large context model to learn (distill) from the gameplay generated by other agents. They put more history in the context so the model needs less samples to learn.

This is significant for robots, bots and AI agents. Transformers are found to be very competent at learning to act/play/work relative to other methods, and this paper shows they can learn with less training.

9

AdditionalPizza t1_itx7tn0 wrote

"AD learns a more data-efficient RL algorithm than the one that generated the source data"

This part of the paper is very interesting. The transformer is able to improve upon the original RL algorithms used during pre-training.

6