Viewing a single comment thread. View all comments

suflaj t1_j5zlq6k wrote

Aside from what others have mentioned, let's assume that we don't have a symmetrical situation, i.e. that the range of the function we're learning, as well as the domain of weights and biases, is [0, inf>. Then it makes more sense to add bias than to subtract it, as it will lead to smaller weights and less chance to overflow or for the gradients to explode.

It makes more sense to subtract the biases if in the scenario described above, you want a more expressive layer, but with less numerical stability. This is because a subtractive bias allows the weights to be of greater magnitude, which in terms gives you more effective range for the weights.

But note that neural networks are not done with integer weights, and in some libraries there is no autograd for integers even.

3