SufficientStautistic t1_iyqag72 wrote
I am always delighted to see a median and accompanying central 5 and 95% quantiles at each validation step/end of each epoch. This is more helpful to me than some multiple of the s.d. A mean with SE goes a lot further than many papers, so even that I will take, just give us a measure of variance, for the love of god haha.
The answer saying that random weight initialization is not ideal is a good one, it's a pain both for reproducibility and other reasons (saw you ask about this in that thread - the variance of random initialisation has to be tuned based on depth so that the io condition number is about 1, otherwise learning is less likely to proceed as quickly or at all). Several deterministic initialisation procedures have been proposed over the years. Here is one from last year that yielded promising results and had some theoretical rationale: https://arxiv.org/abs/2110.12661
Unfortunately their proposed approach isn't available out-of-the-box with TF or PyTorch, but it shouldn't be too tough to implement by hand if you have the time.
optimized-adam OP t1_iyqdvbc wrote
Thank you for your answer! Isn't the SE just the sample standard deviation divided by the square root of n?
Viewing a single comment thread. View all comments