I'm training a per-pixel image classification network, which, for each pixel in the image, predicts whether it is a sign for disease A or disease B. Note that a given pixel could be a sign for both disease A and disease B (this is a multi-label problem).

My question is: are the relative probabilities going to be calibrated? In other words, does it make sense to sort the NxNx2 probabilities, or are the probabilities for the two diseases (i.e. channels) not calibrated / comparable, since it is similar to solving two independent problems?

If it matters, I am using a ResNet, some fully-connected layers, and then a convolutional decoder.

Any thoughts will be much appreciated, thanks in advance!

Comments

You must log in or register to comment.

bimtuckboo t1_j02jss1 wrote on December 13, 2022 at 4:59 PM

Easiest way to find out is to make some calibration plots with your validation set. From there, depending on what the plots look like, there are some things you can do to improve the calibration post training. Look into temperature scaling and Platt scaling.

alkaway OP t1_j02phdn wrote on December 13, 2022 at 5:34 PM

Thanks so much for your response! Does temperature scaling change the relative ordering of the probabilities?

bimtuckboo t1_j02qida wrote on December 13, 2022 at 5:41 PM

No it does not. It simply scales the probabilities to either all be closer to 0.5 or all be further from 0.5

Moderatecat t1_j02b4ph wrote on December 13, 2022 at 4:01 PM

most modern deep neural nets are not well-calibrated by default. Your model output, even after normalization, can not be interpreted as probabilities unless it is well-calibrated

alkaway OP t1_j02oxhi wrote on December 13, 2022 at 5:31 PM

Thanks so much for your response! Are you aware of any calibration methods I could try? Preferably ones which won't to long to implement / incorporate :P

gosnold t1_j036xfj wrote on December 13, 2022 at 7:24 PM

Temperature adjustment in the softmax layer is quick and easy

ResponsibilityNo7189 t1_j02dzwf wrote on December 13, 2022 at 4:22 PM

It's an open problem to get your network probabilities to be calibrated. First you might want to read aleatoric vs. epistemic uncertainty. https://towardsdatascience.com/aleatoric-and-epistemic-uncertainty-in-deep-learning-77e5c51f9423

MonteCarlo sampling and training have been used to get a sense of uncertainty.

Also changing the Softmax temperature to get less confident outputs might "help".

alkaway OP t1_j02oy3b wrote on December 13, 2022 at 5:31 PM

Thanks so much for your response! Is temperature scaling the go-to calibration method I should try? Does temperature scaling change the relative ordering of the probabilities?

ResponsibilityNo7189 t1_j02t093 wrote on December 13, 2022 at 5:56 PM

does note change the order. It will make the prediction less "stark", i.e. instead of .99 and 0.0001 0.002 0.007, you will get something like 0.75, 0.02, 0.04, 0.19 for instance. It is the easiest thing to do, but remember there isn't any "go-to" technique.

pm_me_your_ensembles t1_j01xzcw wrote on December 13, 2022 at 2:23 PM

The two are not comparable. In a multi-class single-label problem, you do K distinct projections, one for each class, but then they are combined via softmax to give you something that resembles probabilities. Since no such function is applied, it's not possible to compare the two as they don't influence each other in any way.

However, you shouldn't treat whatever a NN outputs as a probability even if it's within [0,1] as NNs are known to be overconfident.

alkaway OP t1_j01zhl7 wrote on December 13, 2022 at 2:33 PM

Thanks so much for your response!

This makes sense. Are you aware of any techniques that can be used to make these probabilities comparable?

I understand that the outputs shouldn't necessarily be treated as probabilities. I simply want a relative ordering of the pixels in terms of "likelihood."

trajo123 t1_j023qfb wrote on December 13, 2022 at 3:04 PM

You could reformulate your problem to output 4 channels: "only disease A", "only disease B", "both disease A and disease B" and "no disease". This way a softmax can be applied to to these outputs, their probabilities summing to 1.

[EDIT] corrected number of classes

alkaway OP t1_j024u31 wrote on December 13, 2022 at 3:12 PM

Thanks for your response -- This is an interesting idea! Unfortunately, I am actually training my network to predict 1000+ classes, for which such an idea would be computationally intractable...

trajo123 t1_j029y2r wrote on December 13, 2022 at 3:52 PM

Ah, yes it doesn't really make sense for more than a couple of classes. So if you can't make your problem multi-class, have you tried any probability calibration on the model outputs? This should make them "more comparable", I think this is the best you can do with a deep learning model.

But why do you want to rank the outputs per pixel? Wouldn't some per-image aggregate over the channels make more sense?

alkaway OP t1_j02owfb wrote on December 13, 2022 at 5:31 PM

Thanks so much for your response! Are you aware of any calibration methods I could try? Preferably ones which won't take long to implement / incorporate :P

trajo123 t1_j031wsx wrote on December 13, 2022 at 6:52 PM

Perhaps scikit-learn's "Probability calibration" section would be a good place to start. Good luck!

LearnDifferenceBot t1_j02p3jr wrote on December 13, 2022 at 5:32 PM

> won't to long

*too

Learn the difference here.

^(Greetings, I am a language corrector bot. To make me ignore further mistakes from you in the future, reply !optout to this comment.)

[deleted] t1_j023o61 wrote on December 13, 2022 at 3:03 PM

[deleted]

alkaway OP t1_j02675d wrote on December 13, 2022 at 3:23 PM

I'm not sure I understand. Are you suggesting I normalize each pixel in each NxN label-map to be mean 0 and std of 1? And then use this normalized label-map during training?

pm_me_your_ensembles t1_j02eijz wrote on December 13, 2022 at 4:25 PM

Never mind my previous comment.

You could normalize both channels, ie for label 1, normalize the NxN tensor pixel, same for label 2.

SlowFourierT198 t1_j02nin5 wrote on December 13, 2022 at 5:22 PM

Depending on the problem you may use Bayesian Neural Networks where you fit a distribution over the weights they are better calibrated but also expensive. There exists some theory on lower cost ways to make the model better calibrated / uncertainty aware. One direction is using Gaussian Process approximations an other is for example PostNet. The overal topic you can search for is uncertainty quantification

alkaway OP t1_j02pj3l wrote on December 13, 2022 at 5:35 PM

Thanks so much for your response! Will take a look.

Red-Portal t1_j02wvcs wrote on December 13, 2022 at 6:21 PM

With deep neural networks, I would say conformal predictions are the best way to get uncertainty estimates.

CommunismDoesntWork t1_j03qv7i wrote on December 13, 2022 at 9:24 PM

Why do you need probabilities? You'd be better off spending more time on making your model more accurate period, even if it can be confidently wrong sometimes.