bluuerp
bluuerp t1_ivov64y wrote
Reply to [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS
The gradient gives you the optimal improvement direction....if you have 10 positions the gradient of all 10 will point in different directions...so if you take a step after each point you'll zig zag arround a lot. You might even backtrack a bit. If you however take the average of all 10 and do a step you won't be optimal in regards to all points individually, but the path you'll take will be smoother.
So it depends on your dataset. Usually you want to have some smoothing because otherwise you won't converge that easily.
​
The same is true for your example....the center point might not be a good estimate of the surrounding. It could however be that it is close to the average and there isn't that big of a difference.
bluuerp t1_iux2ixx wrote
A neural network reduces a large number of parameters down to a few. And even those that don't like autoencoders have some kind of bottleneck. Hence they are lossy data compression methods. That is how they learn. It is in it's vary nature not reversible. You can't reverse dog/cat output to a full image...but you can use gradCAM to get estimates. I.e you can use gradient ascent to get what you are looking for. Do that for a bunch of different random noise start values and you can estimate which neurons are most responsible for a certain output class.
bluuerp t1_ivpshwu wrote
Reply to comment by uncooked-cookie in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS
Yes I meant the optimal improvement direction for that point.