Best practice for capping a softmax Submitted by neuralbeans t3_10puvih on January 31, 2023 at 9:41 AM in deeplearning 16 comments 6
nutpeabutter t1_j6n2eaf wrote on January 31, 2023 at 2:32 PM Taking a leaf out of RL, you can add an additional entropy loss. Alternatively, clip the logits but apply STE (copy gradients) on backprop Permalink 1
Viewing a single comment thread. View all comments