Submitted by blazejd t3_yzzxa2 in MachineLearning
Think about how children learn their native language. At the very beginning, they listen to the language adults use. Later on, they start to produce very basic forms of communication, just using a handful of single words. Only over time, they come up with longer sentences and correct grammar. Most importantly, they continuously interact with people that already speak the language (the environment) and receive real-time feedback and their mistakes are corrected by others. This sounds very similar to reinforcement learning.
On the other hand, current large language models "learn the language" by passively reading huge amounts of text and trying to predict the next word. While the results are impressive, it is not the most intuitive approach and reinforcement learning feels better.
Why do you think the general research trend didn't go in this direction?
HateRedditCantQuitit t1_ix4d4sx wrote
You can scale semi-supervised learning much more easily and cheaply and safely than you can scale human-in-the-loop RL. Similar to why we don’t train self driving cars by putting them in the real world and making them learn by RL.
If we could put a language model in a body and let it learn safely through human tutoring in a more time effective and cost effective way, maybe it could be worthwhile. Today, it doesn’t seem to be the time effective or cost effective solution.
And while I’m on my podium, once LMs are in any loop commercially talking to people at scale, I expect this will be a huge topic.
Tangentially, check out this short story/novella that kinda explores the idea from a fictional perspective. It’s incredibly well written and interesting by a favorite author of mine. “The Lifecycle of Software Objects” by Ted Chiang https://web.archive.org/web/20130306030242/http://subterraneanpress.com/magazine/fall_2010/fiction_the_lifecycle_of_software_objects_by_ted_chiang