Viewing a single comment thread. View all comments

thiru_2718 t1_jb6njez wrote

>supervised learning can teach a model to complete a human-defined task. But reinforcement learning can teach a model to choose its own tasks to complete arbitrary goals.

Isn't this contradicted by LLMs demonstrating emergent abilities (like learning how meta-learning strategies, or in-context learning) that allow it to tackle complex sequential tasks adaptively? There is research (i.e. https://innermonologue.github.io/) where LLMs are successfully applied to a traditional RL domain - planning and interaction for robots. While there is RLHF involved in models like ChatGPT, the bulk of the model's reasoning comes from the supervised learning.

As far as I can tell, the unexpected, emergent abilities of LLM have somewhat rewritten our assumptions of what is capable through supervised learning, and should be extended into the RL domain.

−1