Submitted by neuralbeans t3_10puvih in deeplearning
FastestLearner t1_j6mhjd2 wrote
Use composite loss, i.e. add extra terms in the loss function to make the optimizer force the logits to stay within a fixed range.
For example, if current min logit = m
and allowed minimum = u
, current max logit = n
and allowed maximum = v
, then the following loss function should help:
Overall loss = CrossEntropy loss + lambda1 * max(u - m, 0) and lambda2 * max(n - v, 0)
The max
terms ensure that no loss is added when the logits are all within the allowed range. Use lamba1
and lambda2
to scale each term so that they roughly match the CE loss
in strength.
Viewing a single comment thread. View all comments