Submitted by neuralbeans t3_10puvih in deeplearning
I'd like to train a neural network where the softmax output has a minimum possible probability. During training, none of the probabilities should go below this minimum. Basically I want to avoid the logits from becoming too different from each other so that none of the output categories are ever completely excluded in a prediction, a sort of smoothing. What's the best way to do this during training?
FastestLearner t1_j6mhjd2 wrote
Use composite loss, i.e. add extra terms in the loss function to make the optimizer force the logits to stay within a fixed range.
For example, if current min logit =
m
and allowed minimum =u
, current max logit =n
and allowed maximum =v
, then the following loss function should help:Overall loss = CrossEntropy loss + lambda1 * max(u - m, 0) and lambda2 * max(n - v, 0)
The
max
terms ensure that no loss is added when the logits are all within the allowed range. Uselamba1
andlambda2
to scale each term so that they roughly match theCE loss
in strength.