Viewing a single comment thread. View all comments

Oripy t1_j8m8ejv wrote

I have a question related to the Actor Critic method described in the keras example here: https://keras.io/examples/rl/actor_critic_cartpole/

I looked at the code for the Train part, and I think I understand what all lines are supposed to do and why they are there. However, I don't think I understand what role the critic plays in the improvement of the agent. To me this critic is just a value that predicts the future reward, but I don't see this being fed back into the system for the agent to select a better action to improve its reward.

Do I have a good understanding? Is the critic just a "bonus" output? Are the two unrelated and the exact same performance could be achieved by removing the Critic output altogether? Or is the critic output used in any way to improve learning rate in a way I fail to see?

Thank you.

1

schwagggg t1_j916cc5 wrote

so actor critic without critic is just policy gradient/reinforce/score function gradient, first two names used in RL, last one used in stats/OR.

short answer is policy gradient tends to have high variances empirically, so people use control variates to control its variance, and the critic is simply the control variate.

high variance methods usually converge to worse local minimas than low variance ones. u can verify this by taking or the critic function entirely. try it itself with that tutorial

1