Submitted by neuralbeans t3_10puvih in deeplearning
chatterbox272 t1_j6myph4 wrote
If the goal is to keep all predictions above a floor, the easiest way is to make the activation into floor + (1 - floor * num_logits) * softmax(logits)
. This doesn't have any material impact on the model, but it imposes a floor.
If the goal is to actually change something about how the predictions are made, then adding a floor isn't going to be the solution though. You could modify the activation function some other way (e.g. by scaling the logits, normalising them, etc.), or you could impose a loss penalty for the difference between the logits or the final predictions.
neuralbeans OP t1_j6n0ima wrote
I want the output to remain a proper distribution.
chatterbox272 t1_j6n3vx6 wrote
My proposed function does that. Let's say you have two outputs, and don't want either to go below 0.25. Your minimum value already adds up to 0.5, so you rescale the softmax to add up to 0.5 as well, giving you a sum of 1 and a valid distribution.
Viewing a single comment thread. View all comments