Submitted by hardmaru t3_ys36do in MachineLearning
zimonitrome t1_iwc14i5 wrote
Reply to comment by maybelator in [R] ZerO Initialization: Initializing Neural Networks with only Zeros and Ones by hardmaru
Wow thanks for the explanation, it does make sense.
I had a pre-conception that all optimizers dealing with any linear functions (kinda like L1 norm) still produce values close to 0.
I can see someone disregarding tiny values when using said sparsity (pruning, quantization) but didn't think that it would be exactly 0.
Viewing a single comment thread. View all comments