Submitted by AutoModerator t3_110j0cp in MachineLearning
schwagggg t1_j916cc5 wrote
Reply to comment by Oripy in [D] Simple Questions Thread by AutoModerator
so actor critic without critic is just policy gradient/reinforce/score function gradient, first two names used in RL, last one used in stats/OR.
short answer is policy gradient tends to have high variances empirically, so people use control variates to control its variance, and the critic is simply the control variate.
high variance methods usually converge to worse local minimas than low variance ones. u can verify this by taking or the critic function entirely. try it itself with that tutorial
Viewing a single comment thread. View all comments