emilrocks888

emilrocks888 t1_j6mjnk7 wrote on January 31, 2023 at 11:45 AM

Reply to comment by neuralbeans in Best practice for capping a softmax by neuralbeans

Sorry, dictionary issue. I meant Self Attention (I ve edited previous answer)

emilrocks888 t1_j6mjf7m wrote on January 31, 2023 at 11:42 AM

Reply to Best practice for capping a softmax by neuralbeans

I would scale logits before softmax, like it’s been done in self attention.Actually that scaling in self attn is to make the final dist of the attention weights to be smooth.