Ulfgardleo t1_iza5481 wrote on December 7, 2022 at 4:38 PM

the hessian is always better information than the natural gradient because it includes actual information of the curvature of the function while the NG only includes curvature of the model. So any second order TR approach with NG information will approach the hessian.

//edit: I am assuming actual trust region methods, like TR-Newton, and not some RL-ML approximation schemes.

UnusualClimberBear t1_izaa5xr wrote on December 7, 2022 at 5:12 PM

For RL you also need to account for the uncertainty from the states actions you almost ignored during data collection but you'd like to use more. Gradient on a policy has different behavior than a gradient on a supervised loss.