Submitted by vidul7498 t3_11itl7g in MachineLearning

Francois Chollet's recent tweet where he states: (https://twitter.com/fchollet/status/1630241783111364608)
"The answer to "when should I use deep RL" is that you shouldn't -- you should reframe your problem as a supervised learning problem, which is the only thing that curve-fitting can handle. In all likelihood this applies to RLHF for LLMs."

The people at DeepMind and OpenAI still seem bullish on RL but I have seen this kind of sentiment among other big names in DL as well. The most common sentiment I've seen is that RL is only good for extremely specific scenarios, other than that Supervised Learning is a much better option.

What do you guys think, is RL doomed or is it the future? Also, would it be one day possible to apply RL to a more general range of problems or will it always be niche?

9

Comments

You must log in or register to comment.

currentscurrents t1_jazwqft wrote

The reason you want to do RL is that there's problem scenarios where RL is the only way to learn the problem.

Unsupervised learning can teach a model to understand the world, and supervised learning can teach a model to complete a human-defined task. But reinforcement learning can teach a model to choose its own tasks to complete arbitrary goals.

Trouble is, the training signal in reinforcement learning is a lot smaller, so you need ridiculous amounts of training data. Current thinking is that you need to use unsupervised learning to learn a world model + RL to learn how to achieve goals inside that model. This combination has worked very well for things like DreamerV3.

18

thiru_2718 t1_jb6njez wrote

>supervised learning can teach a model to complete a human-defined task. But reinforcement learning can teach a model to choose its own tasks to complete arbitrary goals.

Isn't this contradicted by LLMs demonstrating emergent abilities (like learning how meta-learning strategies, or in-context learning) that allow it to tackle complex sequential tasks adaptively? There is research (i.e. https://innermonologue.github.io/) where LLMs are successfully applied to a traditional RL domain - planning and interaction for robots. While there is RLHF involved in models like ChatGPT, the bulk of the model's reasoning comes from the supervised learning.

As far as I can tell, the unexpected, emergent abilities of LLM have somewhat rewritten our assumptions of what is capable through supervised learning, and should be extended into the RL domain.

−1

ilyakuzovkin t1_jazwvuh wrote

I think RL is a niche by definition, but that's not a bad thing. If the problem you want to solve is about agents operating in interactive environments and maximizing some kind of utility function along the way - surely RL is your workhorse here.

Over the course of the last years we have seen successful applications of RL outside that narrow field of problems, where a problem that is seemingly not about agents and environments can still be formulated as an MDP and then solved with an RL approach. Because of these examples there seems to be a looming sentiment that RL is somehow "instead of" supervised, and questions like "which is better RL or supervised" arise.

My take on this would he that both are applicable in their appropriate spaces of problem formulations. Some problems are made to be solved with SL, some other ones with RL. And while it is feasible to twist an SL problem into RL framework, or even vice versa, it does not imply that one or the other is the ultimate tool.

Same way as one wouldn't use RL to multiply two numbers (except for academic interest), one should not use RL if it is not the right framework for the problem at hand. But for some other problems RL will definitely be (and already is, like in Go, Chess, Startcraft) the future.

16

ggdupont t1_jb14rw1 wrote

>Over the course of the last years we have seen successful applications of RL

Like real production level applications?
Apart from super nice demo and research paper, I've really not seen much RL in real life production.

1

ThaGooInYaBrain t1_jb342rd wrote

> "In October 2022, DeepMind unveiled a new version of AlphaZero, called AlphaTensor, in a paper published in Nature. The version discovered a faster way to perform matrix multiplication – one of the most fundamental tasks in computing – using reinforcement learning."

Matrix multiplication is a pretty damn practical real life application, no?

3

ggdupont t1_jb4onx8 wrote

Anything in production yet?

2

cantfindaname2take t1_jb42qee wrote

Isn't it extensively used in robotics??

1

ggdupont t1_jb4olgn wrote

I have probably not a complete view but worked in very large hardware industry and all robots were using classic optimal control approach (like the one used by Boston dynamic) non were using RL.

4

tripple13 t1_jb0ksx6 wrote

I find it quite ridiculous to discount RL. Optimal control problems have existed since the beginning of time, and for the situations in which you cannot formulate a set of differential equations, optimizing obtuse functions with value or policy optimization could be a way forward.

It reminds me of the people who discount GANs due to their lack of a likelihood. Sure, but can it be useful regardless? Yes, actually, it can.

14

tonicinhibition t1_jb1fgpe wrote

> people who discount GANs due to their lack of a likelihood

I was going to ask you to expand on this a little, but instead found a post that describes it pretty well for anyone else who is curious:

Do GANS really model the true data distribution...

For further nuance on this topic, Machine Learning Street Talk discussed interpolation vs extrapolation with Yann LeCun regarding interpolation vs extrapolation, which Letitia Parcalabescu summarizes here.

1

currentscurrents t1_jb1j20n wrote

>Do GANS really model the true data distribution...

I find their argument to be pretty weak. Of course these images look semantically similar; they ran a semantic similarity search to find them.

They are clearly not memorized training examples. The pose, framing, and facial expressions are very different.

5

tonicinhibition t1_jb1ntqz wrote

I don't think the author of the post took a position on the original argument; rather they just presented ways to explore the latent space and make comparisons that are reasonable so that we might derive better distance metrics.

I see it as a potential way to probe for evidence of mode collapse.

1

yannbouteiller t1_jb17aaw wrote

People will say anything in hope of drawing attention. Reframing an unexplored MDP into a supervised learning problem makes no sense.

11

ok531441 t1_jb0229i wrote

Why would RL be doomed? Didn’t sticking RL on top of a big GPT model just give us ChatGPT?

10

ggdupont t1_jb152am wrote

That's the cherry on the top (see https://twitter.com/hlntnr/status/1632030583462285312 ), not the core of the app.

(edit in reaction to downvotes: in all transparency, I love RL paradigm and really think this is decision making approaches are a key to AI ; this being said, my experience in industrial application of RL has always been disapointing in that others approaches did better ;-) )

−3

ml-research t1_jb1nlad wrote

People said similar things about deep learning a long time ago.

If you can use supervised learning, then you should, because it means you have tons of data with ground-truth labels for each decision. But many real-world problems are not like that. Even humans don't know if each of their decisions is optimal or not.

3

alterframe t1_jb5oel8 wrote

RL is one of those concepts where it's easy to fool ourselves that we get it, but in reality we don't. We have this fuzzy notion of what RL is and what it is good for, so in our imagination this is going to be a perfect match for our problem. In reality, our problem may look like those RL-friendly tasks on the surface, but we are lacking several important properties or challenges to really make it reasonable.

It doesn't mean that this is not useful at all. Quite opposite. People are wrongly discouraged from RL, based on experience with projects where it didn't actually make sense, and draw false conclusions about it's practicality.

1