Submitted by [deleted] t3_11d4ka5 in MachineLearning
CellWithoutCulture t1_ja6pjet wrote
Seems more like an AskML question.
But RL is for situations when you can't backprop the loss. It's noisier than supervised learning. So if you can use supervised learning, then that's what you should generally use.
RL is still used, for example the recent GATO and Dreamer v3. Or used in training an LLM to use tools like in toolformer. And also OpenAI's famous RLHF, which stands for reinforcement learning with human feedback. This is what they use to make ChatGPT "aligned" although in reality it doesn't get there.
tdgros t1_ja71ave wrote
>toolformer
Are you sure there's RL in Toolformer? I thought it was mostly self-supervised and fine-tuned.
CellWithoutCulture t1_ja7dklj wrote
> Toolformer
....oh you're right it didn't. I assumed they let it use any tool which would need RL. But it seems like they had pre-labelled ways to use tools.
Thanks for pointing that out.
Tea_Pearce t1_ja73tpy wrote
fyi, GATO used imitation learning, which is closer to supervised than RL.
Viewing a single comment thread. View all comments