Viewing a single comment thread. View all comments

TheLastVegan t1_ivbvx23 wrote

>As reincarnating RL leverages existing computational work (e.g., model checkpoints), it allows us to easily experiment with such hyperparameter schedules, which can be expensive in the tabula rasa setting. Note that when fine-tuning, one is forced to keep the same network architecture; in contrast, reincarnating RL grants flexibility in architecture and algorithmic choices, which can surpass fine-tuning performance (Figures 1 and 5).

Okay so agents can communicate weights between architectures. That's a reasonable conclusion. Sort of like a parent teaching their child how to human.

I thought language models already do this at inference time. So the goal of the RRL method is to subvert the agent's trust..?

1