Viewing a single comment thread. View all comments

schwagggg t1_j916cc5 wrote

Reply to comment by Oripy in [D] Simple Questions Thread by AutoModerator

so actor critic without critic is just policy gradient/reinforce/score function gradient, first two names used in RL, last one used in stats/OR.

short answer is policy gradient tends to have high variances empirically, so people use control variates to control its variance, and the critic is simply the control variate.

high variance methods usually converge to worse local minimas than low variance ones. u can verify this by taking or the critic function entirely. try it itself with that tutorial

1