Viewing a single comment thread. View all comments

CellWithoutCulture t1_ja6pjet wrote

Seems more like an AskML question.

But RL is for situations when you can't backprop the loss. It's noisier than supervised learning. So if you can use supervised learning, then that's what you should generally use.

RL is still used, for example the recent GATO and Dreamer v3. Or used in training an LLM to use tools like in toolformer. And also OpenAI's famous RLHF, which stands for reinforcement learning with human feedback. This is what they use to make ChatGPT "aligned" although in reality it doesn't get there.

12

tdgros t1_ja71ave wrote

>toolformer

Are you sure there's RL in Toolformer? I thought it was mostly self-supervised and fine-tuned.

2

CellWithoutCulture t1_ja7dklj wrote

> Toolformer

....oh you're right it didn't. I assumed they let it use any tool which would need RL. But it seems like they had pre-labelled ways to use tools.

Thanks for pointing that out.

2

Tea_Pearce t1_ja73tpy wrote

fyi, GATO used imitation learning, which is closer to supervised than RL.

1