DACUS1995 t1_j7vltrw wrote on February 9, 2023 at 6:51 PM

As you said most deep-learning models use some sort of regularization at training so there is some implicit constraint on the actual values of the weights, even more so when the number of parameters goes in the range of billions where you will have an inherent statistical distribution of the feature importance. On the more explicit and fixed side, there are a couple of papers and efforts in the area of quantization where parameter outliers in various layers affect the precision of quantized representation, so you would want a reduced variance in the block or layers values. For example, you can check this: https://arxiv.org/abs/1901.09504.