Viewing a single comment thread. View all comments

smallest_meta_review OP t1_iva4dj7 wrote

> Tabula rasa RL vs. Reincarnating RL (RRL). While tabula rasa RL focuses on learning from scratch, RRL is based on the premise of reusing prior computational work (e.g., prior learned agents) when training new agents or improving existing agents, even in the same environment. In RRL, new agents need not be trained from scratch, except for initial forays into new problems.

More at https://ai.googleblog.com/2022/11/beyond-tabula-rasa-reincarnating.html?m=1

5

smurfpiss t1_ivaf5ia wrote

Not experienced With RL much, but how is that different than an algorithm going through training iterations?

In that case the parameters are tweaked from past learned parameters. What's the benefit of learning from another algorithm? Is it some kind of weird offspring of skip connections and transfer learning?

5

smallest_meta_review OP t1_ivaghqa wrote

Good question. The original blog post somewhat covers this:

> Imagine a researcher who has trained an agent A_1 for some time, but now wants to experiment with better architectures or algorithms. While the tabula rasa workflow requires retraining another agent from scratch, Reincarnating RL provides the more viable option of transferring the existing agent A1 to a different agent and training this agent further, or simply fine-tuning A_1.

But this is not what happens in research. For example, each time we are training a new agent to let say play an Atari game, we train it from scratch ignoring all the prior agents trained on that game. This work argues that why not reuse learned knowledge from the existing agent while training new agents (which may be totally different).

3

smurfpiss t1_ivah7ul wrote

So, transfer learning but with different architectures? That's pretty neat. Will give it a read thanks 😊

3

smallest_meta_review OP t1_ivam34g wrote

Yeah, or even across different classes of RL methods: reusing a policy for training a value-based RL (e.g, DQN) or model-based RL method.

3

TheLastVegan t1_ivbvx23 wrote

>As reincarnating RL leverages existing computational work (e.g., model checkpoints), it allows us to easily experiment with such hyperparameter schedules, which can be expensive in the tabula rasa setting. Note that when fine-tuning, one is forced to keep the same network architecture; in contrast, reincarnating RL grants flexibility in architecture and algorithmic choices, which can surpass fine-tuning performance (Figures 1 and 5).

Okay so agents can communicate weights between architectures. That's a reasonable conclusion. Sort of like a parent teaching their child how to human.

I thought language models already do this at inference time. So the goal of the RRL method is to subvert the agent's trust..?

1