Viewing a single comment thread. View all comments

Ulfgardleo t1_iza5481 wrote

the hessian is always better information than the natural gradient because it includes actual information of the curvature of the function while the NG only includes curvature of the model. So any second order TR approach with NG information will approach the hessian.

//edit: I am assuming actual trust region methods, like TR-Newton, and not some RL-ML approximation schemes.

4

UnusualClimberBear t1_izaa5xr wrote

For RL you also need to account for the uncertainty from the states actions you almost ignored during data collection but you'd like to use more. Gradient on a policy has different behavior than a gradient on a supervised loss.

5