Linear-- OP t1_j9xqtsx wrote on February 25, 2023 at 9:07 AM

Reply to comment by currentscurrents in [D] Isn't self-supervised learning(SSL) simply a kind of SL? by Linear--

It's clear that human and other animals must learn with reinforcement -- requiring the agent to act and recevive feedback/reward. This is an important part and I don't think it's proper to classify it as SSL. Moreover, psychology on learning points out that problem-solving and immediate feedback is very important for learning outcomes -- these feedbacks are typically human labels or reward signal.

currentscurrents t1_j9yxr37 wrote on February 25, 2023 at 4:24 PM

Look up predictive coding; neuroscientists came up with it in the 80s and 90s.

A good portion of learning works by trying to predict the future and updating your brain's internal model when you're wrong. This is especially involved in perception and world modeling tasks, like vision processing or commonsense physics.

You would have a very hard time learning this from RL. Rewards are sparse in the real world, and if you observe something that doesn't affect your reward function, RL can't learn from it. But predictive coding/self-supervised learning can learn from every bit of data you observe.

You do also use RL, because there are some things you can only learn through RL. But this becomes much easier once you already have a rich mental model of the world. Getting good at predicting the future makes you very good at predicting what will maximize your reward.

AmalgamDragon t1_j9zyyib wrote on February 25, 2023 at 8:30 PM

> Rewards are sparse in the real world

This doesn't seem true. The only reason we aren't getting negative rewards (e.g. pain, discomfort, etc.) constantly is that we learn to generally avoid them.

currentscurrents t1_ja5isuz wrote on February 27, 2023 at 12:11 AM

Imagine you need to cook some food. None of the steps of cooking give you any reward, you only get the reward at the end.

Pure RL will quickly teach you not to touch the burner, but it really struggles with tasks that involve planning or delayed rewards. Self-supervised learning helps with this by building a world model that you can use to predict future rewards.

AmalgamDragon t1_ja5lz5b wrote on February 27, 2023 at 12:35 AM

This really comes down to how 'reward' is defined. I think we likely disagree on that definition, with yours being a lot narrower then mine is. For example, during the cooking process, there is usually a point before the meal is done where it 'smells good', which is a reward. There's dopamine release as well, which could be triggered when completing some of the steps (don't know if that's the case or not), but simply observing that a step is complete is rewarding for lots of folks.

> Pure RL will quickly teach you not to touch the burner, but it really struggles with tasks that involve planning or delayed rewards.

Depends on which algorithms you're using, but PPO can handle this quite well.

currentscurrents t1_ja5n5xi wrote on February 27, 2023 at 12:44 AM

Those are all internal rewards, which your brain creates because it knows (according to the world model) that these events lead to real rewards. It can only do this because it has learned to predict the future.

>PPO can handle this quite well.

"Quite well" is still trying random actions millions of times. World modeling allows you to learn from two orders of magnitude less data.

visarga t1_j9y7fro wrote on February 25, 2023 at 12:51 PM

Words in language are both observations and actions. So language modelling is also a kind of supervised policy learning?

So... Self Supervised Learning is Unsupervised & Supervised & Reinforcement Learning.