Viewing a single comment thread. View all comments

prototypist t1_j71p3d6 wrote

You can fine-tune language models on a dataset, and that's essentially how people have been typically doing NLP with transformers models? It's more recent that research has been having success with RL for these kinds of tasks. So whatever rationale and answers you get here, the main reason is that they were doing supervised learning before and the RL people started getting better results.

1