Submitted by hardmaru t3_ys36do in MachineLearning
master3243 t1_iw1h2h7 wrote
Reply to comment by elcric_krej in [R] ZerO Initialization: Initializing Neural Networks with only Zeros and Ones by hardmaru
> potentially removes a lot of random variance from the process of training
You don't need the results of this paper for that.
One of my teams had a pipeline where every single script would initialize the seed of all random number generators (numpy, torch, pythons radom) to 42.
This essentially removed non-machine-precision stochasticity between different training iterations with the same inputs.
bluevase1029 t1_iw1khv8 wrote
I believe it's still difficult to be absolutely certain you have same initialisation across multiple machines, versions of pytorch etc. I could be wrong though.
master3243 t1_iw1mpgb wrote
Definitely if each person has a completely different setup.
But that's why we contenirize our setups and use a shared environment setup
elcric_krej t1_iw7hss0 wrote
I guess so, but that doesn't scale to more than one team (we did something similar) and arguably you want to test across multiple seeds, assume some init + model are just very odd minima.
This seems to yield higher uniformity without constraining us on the rng.
But see /u/DrXaos for why not really
DrXaos t1_iw7o3ef wrote
In my typical use, I’ve found that changing random init seeds (and also random seeds for shuffling examples during training, don’t forget that one) in many cases induces a larger variance on performance than many algorithmic or hyper parameter changes. Most prominently with imbalanced classification, which if often the reality of the valuable problem.
I guess it’s better to be lucky than smart.
Avoiding looking at the results of random init can make you think you’re smarter than you are and will tell yourselves false stories.
Viewing a single comment thread. View all comments