Viewing a single comment thread. View all comments

blazejd OP t1_ix8ibmh wrote

>For actually learning language, in the sense of using it to convey meaningful, appropriate information, which LLMs so far cannot do, maybe it's better to take an RL approach. But I don't know how to write a reward function that encompasses that. So as long as we can't do the superior thing with either approach, we might as well focus on the easier approach to the superficial thing.

My understanding of this paragraph simply put is (correct me if I'm wrong) "RL might be better, but we don't know how to do it, so let's not try. Language models are doing fine.".

In my opinion, in science we should focus simultaneously on easier problems that can lead to shorter-term gains (language models) AND ALSO more difficult problems that are riskier but might be better long term (RL-based).

1

idrajitsc t1_ix8lrk0 wrote

I mean, I'm not really sure what your ask is. People do work on RL for NLP. It just doesn't offer any huge advantage, and the reason your intuition doesn't translate to an actual advantage is because writing a reward function that reproduces the human feedback a baby receives is essentially impossible. And not just in a, it's hard but if we put enough work into it we can figure it out, kind of way.

2

blazejd OP t1_ix8k4wf wrote

>Sure, but the answer remains: what reward function do you use that encompasses understanding and communicating, on top of grammar?

I realize this doesn't directly answer your question, so might point is that we don't know the answer, but we should at least try to pursue it.

1