schwagggg t1_j916cc5 wrote on February 18, 2023 at 1:16 PM

Reply to comment by Oripy in [D] Simple Questions Thread by AutoModerator

so actor critic without critic is just policy gradient/reinforce/score function gradient, first two names used in RL, last one used in stats/OR.

short answer is policy gradient tends to have high variances empirically, so people use control variates to control its variance, and the critic is simply the control variate.

high variance methods usually converge to worse local minimas than low variance ones. u can verify this by taking or the critic function entirely. try it itself with that tutorial