Viewing a single comment thread. View all comments

ilyakuzovkin t1_jazwvuh wrote

I think RL is a niche by definition, but that's not a bad thing. If the problem you want to solve is about agents operating in interactive environments and maximizing some kind of utility function along the way - surely RL is your workhorse here.

Over the course of the last years we have seen successful applications of RL outside that narrow field of problems, where a problem that is seemingly not about agents and environments can still be formulated as an MDP and then solved with an RL approach. Because of these examples there seems to be a looming sentiment that RL is somehow "instead of" supervised, and questions like "which is better RL or supervised" arise.

My take on this would he that both are applicable in their appropriate spaces of problem formulations. Some problems are made to be solved with SL, some other ones with RL. And while it is feasible to twist an SL problem into RL framework, or even vice versa, it does not imply that one or the other is the ultimate tool.

Same way as one wouldn't use RL to multiply two numbers (except for academic interest), one should not use RL if it is not the right framework for the problem at hand. But for some other problems RL will definitely be (and already is, like in Go, Chess, Startcraft) the future.

16

ggdupont t1_jb14rw1 wrote

>Over the course of the last years we have seen successful applications of RL

Like real production level applications?
Apart from super nice demo and research paper, I've really not seen much RL in real life production.

1

ThaGooInYaBrain t1_jb342rd wrote

> "In October 2022, DeepMind unveiled a new version of AlphaZero, called AlphaTensor, in a paper published in Nature. The version discovered a faster way to perform matrix multiplication – one of the most fundamental tasks in computing – using reinforcement learning."

Matrix multiplication is a pretty damn practical real life application, no?

3

ggdupont t1_jb4onx8 wrote

Anything in production yet?

2

cantfindaname2take t1_jb42qee wrote

Isn't it extensively used in robotics??

1

ggdupont t1_jb4olgn wrote

I have probably not a complete view but worked in very large hardware industry and all robots were using classic optimal control approach (like the one used by Boston dynamic) non were using RL.

4