FastestLearner t1_j6mhjd2 wrote on January 31, 2023 at 11:20 AM

Use composite loss, i.e. add extra terms in the loss function to make the optimizer force the logits to stay within a fixed range.

For example, if current min logit = m and allowed minimum = u, current max logit = n and allowed maximum = v, then the following loss function should help:

Overall loss = CrossEntropy loss + lambda1 * max(u - m, 0) and lambda2 * max(n - v, 0)

The max terms ensure that no loss is added when the logits are all within the allowed range. Use lamba1 and lambda2 to scale each term so that they roughly match the CE loss in strength.