Comments

You must log in or register to comment.

PassingTumbleweed t1_ja6w9ai wrote

It's weird to read this when RLHF has been one of the key components of chat GPT and friends

44

cthorrez t1_ja70abd wrote

I find it a little weird that RLHF is considered to be reinforcement learning.

The human feedback is collected offline and forms a static dataset. They use the objective from PPO but it's really more of a form of supervised learning. There isn't an agent interacting with an env, the "env" is just sampling text from a static dataset and the reward is the score from a neural net trained on a static dataset.

15

gniorg t1_ja7sjkn wrote

So basically, batch reinforcement learning / offline RL? The family of algorithms is useful for recommender systems, amongst others.

3

cthorrez t1_ja8d6oc wrote

Not exactly. In batch RL the data they train on are real (state, action, next state, reward) tuples from real agents interacting with real environments.

They improve the policy offline. In RLHF there actually is no env. And the policy is just standard LLM decoding.

1

tripple13 t1_ja6xhe0 wrote

If all you do is following trends, and whats in the "spotlight" you probably don't care about your research, but care about the accolades.

30

hpstring t1_ja6pm05 wrote

Do you specifically mean applications in NLP? RL seems to have a lot of applications in fields like game playing, robotics, neural theorem proving, etc. which seems to have no direct connection with LLMs

13

[deleted] OP t1_ja6r4xl wrote

[deleted]

−13

hpstring t1_ja6uzk4 wrote

Understood. That depends on personal prediction of the research landscape in the future but I would say it is still researched by institutions like DeepMind. But both RL and LLM share a common aspect: they are very, very expensive.

2

CellWithoutCulture t1_ja6pjet wrote

Seems more like an AskML question.

But RL is for situations when you can't backprop the loss. It's noisier than supervised learning. So if you can use supervised learning, then that's what you should generally use.

RL is still used, for example the recent GATO and Dreamer v3. Or used in training an LLM to use tools like in toolformer. And also OpenAI's famous RLHF, which stands for reinforcement learning with human feedback. This is what they use to make ChatGPT "aligned" although in reality it doesn't get there.

12

tdgros t1_ja71ave wrote

>toolformer

Are you sure there's RL in Toolformer? I thought it was mostly self-supervised and fine-tuned.

2

CellWithoutCulture t1_ja7dklj wrote

> Toolformer

....oh you're right it didn't. I assumed they let it use any tool which would need RL. But it seems like they had pre-labelled ways to use tools.

Thanks for pointing that out.

2

Tea_Pearce t1_ja73tpy wrote

fyi, GATO used imitation learning, which is closer to supervised than RL.

1

IndieAIResearcher t1_ja6t4ba wrote

RL + NLP and RL + Vision would have some future, I guess. It would be an integral part.

2

Tea_Pearce t1_ja753ng wrote

Imo it depends on what you mean by RL. If you interperet RL as the 2015-19 collection of algorithms that train deep NN agents tabula rasa (from zero knowledge), I'd be inclined to agree that it doesn't seem a particularly fruitful research direction to get into. But if you interperet RL as a general problem setting, where an agent must learn in a sequential decision making environment, you'll see that it's not going away.

To me the most interesting recent research in RL (or whatever you want to name it) is figuring out how to leverage existing datasets or models to get agents working well in sequential environments. Think SayCan, ChatGPT, Diffusion BC...

2

jj_HeRo t1_ja6wl0q wrote

Yann LeCun said on Twitter that it is dead... go figure.

1

pyonsu2 t1_ja6xz70 wrote

Hottest ever. RHFL, robotics

1

KBM_KBM t1_ja71zrh wrote

Chat gpt works using a combination of rl and llm

1

307thML t1_ja85qu7 wrote

You're correct that RL has been struggling. Not because of the impressive results by LLMs and image generators, but because the progress within RL has been very slow. People who say otherwise have just forgotten what fast progress looks like; remember 2015-2018 when we first saw human-level Atari play, superhuman Go play, and then superhuman Atari play, as well as impressive results in Starcraft and Dota. I think if you'd asked someone back in 2018 what the next 5 years of RL would look like they would have expected progressively more complicated games to fall, and for agents to graduate from playing with game-state information, as AlphaStar and OpenAI Five did, to besting humans on a level playing field by playing based off of the pixels on the screen the way that agents in Atari do. This hasn't happened.

Instead it turned out that all of this progress was constrained to narrow fields; specifically, games with highly limited input spaces (hence why OpenAI Five and AlphaStar had to take the gamestate directly, which means they get access to information that humans don't) and games where exploration is easy (can be handled in large part or entirely by making random moves some percentage of the time).

I don't think this means the field is dead mind you but it certainly hasn't been making much progress lately.

1

Abject-Stomach5708 t1_ja887ct wrote

do what you like and you believe. Otherwise you are just doing something because of what other people think.

1

rbain13 t1_ja742s1 wrote

Computers are largely failed attempts at doing what our brains do. Our brains use RL (i.e. dopamine + serotonin) and neural networks. It is probably useful to study for that reason alone :shrug:

−2