bluuerp

bluuerp t1_ivov64y wrote

The gradient gives you the optimal improvement direction....if you have 10 positions the gradient of all 10 will point in different directions...so if you take a step after each point you'll zig zag arround a lot. You might even backtrack a bit. If you however take the average of all 10 and do a step you won't be optimal in regards to all points individually, but the path you'll take will be smoother.

So it depends on your dataset. Usually you want to have some smoothing because otherwise you won't converge that easily.

​

The same is true for your example....the center point might not be a good estimate of the surrounding. It could however be that it is close to the average and there isn't that big of a difference.

33

bluuerp t1_iux2ixx wrote

A neural network reduces a large number of parameters down to a few. And even those that don't like autoencoders have some kind of bottleneck. Hence they are lossy data compression methods. That is how they learn. It is in it's vary nature not reversible. You can't reverse dog/cat output to a full image...but you can use gradCAM to get estimates. I.e you can use gradient ascent to get what you are looking for. Do that for a bunch of different random noise start values and you can estimate which neurons are most responsible for a certain output class.

3