amhotw

amhotw t1_jcbpd9z wrote

I understand that. I am pointing out the fact that they started on different paths. One of them was actually matching its name with what it was doing; the other was a contradiction from the beginning.

Edit: Wow, people either can't read or don't read enough history.

−4

amhotw t1_jc0mf55 wrote

If you are serious, I would recommend working on Rudin's Principles of Math Analysis. It might take a day (or more...) to wrap your head around a single proof but at the end you'll be ready to read anything (of course you might need to check some definitions.)

For KL divergence, entropy etc., Info Theory book by Mackay is great.

For hessian, well it is just calculus; the second derivative of a multivariate function. To understand its uses, you would need some understanding of numerical analysis and concave programming. For the latter, Boyd's optimization book is a classic. I don't remember a good book on numerical analysis but some diff. eqn.s books have nice chapters on it.

8

amhotw t1_jb38ai5 wrote

Based on what you copied: they are saying that dropout introduces bias. Hence, it reduces the variance.

Here is why it might be bothering you: bias-variance trade-off makes sense if you are on the efficient frontier, ie cramer-rao bound should hold with equality for trade-off to make sense. You can always have a model with a higher bias AND a higher variance; introducing bias doesn't necessarily reduce the variance.

17