_vb__

_vb__ t1_jdiwjqk wrote on March 24, 2023 at 6:38 PM

Are you calling the zero_grad method on your optimizer in every step of your training loop?

No, it would make the logits be closer to one another and the overall model a bit less confident in its probabilities.