blazejd OP t1_ix7mr03 wrote on November 21, 2022 at 10:57 AM

Thank you everyone for your comments, they were really insightful and gave me some perspective I wouldn't have on my own. I am quite new to ML reddit so wasn't sure what to expect. Here is my quick summary/general reply.

Most of you agreed that we use language modelling because it is the most compute- and time-effective way and that's sort of the best thing we have *right now*, but RL would be interesting to incorporate. However, initializing solely with RL is difficult, including choosing a good objective.

This seems a bit similar to the hype about SVMs in early 2000s (from what I heard from senior researchers, it was a thing). Basically, back then we already had neural networks, but we weren't ready hardware/data-wise so at the time SVMs were performing better due to their simplicity but after 20 years we can clearly see neural nets was the right direction. It's easier to use language model now, they give better short-term performance, but in a couple decades probably RL will outperform them (although very likely multi-modality will be necessary).

A currently feasible step in this direction is merging the two concepts of language models and RL-based feedback. Some papers mentioned are: https://arxiv.org/abs/2203.02155 and "Experience Grounds Language" (although I didn't read them entirely yet). We could initialize a customer-facing chatbot with a language model and then update it RL-style which can be thought of as some form of online or continual learning. The RL objective could be the rating user gives after interacting with the system, the frequency of the use asking to talk to a human assistant or the sentiment of user replies (positive or negative). And if we could come up with that bouncing off ideas on reddit, then probably some company is already doing that.

If you are looking for more related resources, my thoughts were inspired by the field of language emergence (https://arxiv.org/pdf/2006.02419.pdf) and this work (https://arxiv.org/pdf/2112.11911.pdf).