Former developmental psychology student here — the reward function for humans is unbelievably complex and RL draws a lot of its assumptions on classical behaviorist principles rather than cognitive or statistical. One reason why cognitive science was born is to tackle exactly the paucity of stimulus argument ala Chomsky: human children learn language without that much explicit feedback at all.
In RL and NLP, there’s a lot of research in areas like content recommendation systems and using RL as feedback loops in chat in chatbots. In these cases, the language models already exist and the RL model is used to generate feedback into the language models.
Learning the language model itself using only reward would be a fundamentally different philosophical and empirical challenge for science.
Kylaran t1_ix6th4k wrote
Reply to [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd
Former developmental psychology student here — the reward function for humans is unbelievably complex and RL draws a lot of its assumptions on classical behaviorist principles rather than cognitive or statistical. One reason why cognitive science was born is to tackle exactly the paucity of stimulus argument ala Chomsky: human children learn language without that much explicit feedback at all.
In RL and NLP, there’s a lot of research in areas like content recommendation systems and using RL as feedback loops in chat in chatbots. In these cases, the language models already exist and the RL model is used to generate feedback into the language models.
Learning the language model itself using only reward would be a fundamentally different philosophical and empirical challenge for science.