ResponsibilityNo7189

ResponsibilityNo7189 t1_j1ulsd1 wrote

That is why you have hundreds of millions of parameters in a network. There is so many ways for the weights to move that it's not a zero-sum game: some direction will not be so detrimental to other examples. It's precisely for this reason that self-supervised methods tend to work best on very deep networks. see "Scaling Vision Transformers".

7

ResponsibilityNo7189 t1_j02dzwf wrote

It's an open problem to get your network probabilities to be calibrated. First you might want to read aleatoric vs. epistemic uncertainty. https://towardsdatascience.com/aleatoric-and-epistemic-uncertainty-in-deep-learning-77e5c51f9423

MonteCarlo sampling and training have been used to get a sense of uncertainty.

Also changing the Softmax temperature to get less confident outputs might "help".

10

ResponsibilityNo7189 t1_iy6ksf8 wrote

It might be good to code one thing completely from scratch. Why? because it might help you improve other's code, and give you the resilisence and skill to open up other's code and tinker with it. I have seen too many students only wanting to download code from github, and i feel that it is severely hampering their creativity, and thus their research impact. At one point you will have to produce some genuine code, and the coding from scratch will be helpful then.

2