Viewing a single comment thread. View all comments

KingsmanVince t1_it6fskg wrote

Can you explain why the statement is not true? It maybe trivial to you but isn't to some others.


hostilereplicator t1_it6n0dw wrote

The ROC curve plots true positive rate (TPR) against false positive rate (FPR) as you vary the decision threshold. TPR is measured on data labeled with the positive label, while FPR is measured on data labeled with the negative label. These numbers can therefore be measured independently: to measure TPR you only need positive samples and to measure FPR you only need negative samples. So it doesn't matter if you have an unbalanced number of positives and negatives to measure on.

The absolute number of samples of each type is more important, because this affects the uncertainty in your FPR and TPR measurements at each threshold setting. But the balance between number of positives and negatives is not relevant.

The opposite is true for precision-recall curves: recall is measured using only negative positive samples, but precision requires both positive and negative samples to measure. So the measurement of precision is dependent on the ratio of positives:negatives in your data.

The linked blog post references this paper in arguing for the use of precision-recall curves for imbalanced data, but this paper is about visual interpretation of the plots rather than what is "appropriate" in all cases, or whether the curves depend on the ratio or not.


respeckKnuckles t1_it89f8r wrote

Something I never quite understood---TPR and FPR are independent of each other, right? So then how is the plot of the AUC-ROC curve created? What if there are multiple parameters for which the FPR is the same value, but the TPR differs?


DigThatData t1_it8hvbj wrote

each point on the curve represents a decision threshold. given a particular decision threshold, your model will classify points a certain way. as you increment the threshold, it will hit the score of one or more observations, creating a step function as observations are moved from one bin to another as the decision threshold moves across their score.


respeckKnuckles t1_it8ovi2 wrote

Is there a reason then that it's not common to see what the actual threshold is on graphs of AUC-ROC curves? It seems like it would be very helpful to have a little mark on the curve itself for when the threshold is 0.5, for example.


Professional_Pay_806 t1_ita6elf wrote

The threshold isn't important for what the ROC curve is trying to show. You can think about the ROC curve as representing a range of thresholds from the point where all samples are classified as negative (TPR of 0 and FPR of 0), and the point where all samples are classified as positive (TPR of 1 and FPR of 1). The space between is what matters. For a robust classifier, the true positive rate will rise significantly faster than the false positive rate. So a steep slope at the beginning approaching 1 while FPR is still low (which tends to AUC of 1) means the classifier is robust. The closer the AUC is to 1/2 (represented by the diagonal connecting bottom left to top right), the closer the classifier is to effectively tossing a coin and guessing positive if you get heads. It's not about what the specific threshold is, it's about how well-separated the data clusters are in the feature space where the threshold is being used. Thinking about a threshold as typically being 0.5 (because you're just looking for a maximum likelihood of correct classification in a softmax layer or something) is thinking about one very specific type of classifier. The ROC curve is meant to be showing something more generally applicable to any classifier in any feature space.


Professional_Pay_806 t1_ita6rxs wrote

Note you could always perform a linear transformation on your classification layer that shifts your threshold to another arbitrary value with the exact same results, but the ROC curve will remain the same as it was before.


DigThatData t1_it8rc38 wrote

that's a variant that people definitely do sometimes. If you think adding score annotations a particular way should be an out-of-the-box feature in a particular tool you use, you should create an issue on their gh to recommend it or implement it yourself and submit a PR.


hostilereplicator t1_itb8ghh wrote

TPR and FPR of a model are not independent given the decision threshold, which is what you’re varying to produce the ROC curve. As DigThatData said, you get a step function where each step is the point at which a sample crosses the threshold. If you get multiple threshold values where your FPR is the same but TPR changes, leading to a deep step in the curve, it means you haven’t got enough negative samples near to measure FPR precisely in that curve region.


likeamanyfacedgod OP t1_it6gqnz wrote

It's not trivial to me at all. I've seen a few blog posts that make this statement, but from my own experience, it's not true, you can even test it yourself by balancing and unbalancing your model. Look at how it is calculated, the TPR and FPR are both fractions, so it won't matter if one is a larger class than the other. What does matter though is if you care more about predicting one class than the other.


PassionatePossum t1_it6ms0v wrote

>the TPR and FPR are both fractions, so it won't matter if one is a larger class than the other.

In most cases that is a desirable property. You don't want to have excellent results just because one class makes up 99% of your dataset and the classifier just predicts the most common class without learning anything. Precision and Recall are also fractions.

The difference between ROC and Precision/Recall is that ROC needs the concept of a "negative class". That can be problematic for multi-class problems. Even if your data is perfectly balanced across all of your classes, the negative class (i.e. all classes that aren't the class you are examining) is bound to be overrepresented.

Since you only need the positive examples for a precision/recall plot you don't have that problem.

So, I don't have a problem with the statement that ROC is appropiate for a balanced dataset (provided that we have a binary classification problem or the number of different classes is at least low).


madrury83 t1_it7hw31 wrote

I think the more rigorous way to get at the OPs point is to observe that the AUC is the probability that a randomly selected positive class is scored higher (by your fixed model) than a randomly chosen negative class. Being probabilities, these are independent (at a population level) of the number of samples you have from your positive and negative populations (of course, smaller samples get you more sampling variance). I believe this is the OPs point with "they are fractions".

In any case, can we at least all agree that blogs/articles throwing around this kind of advice without justification is less than helpful?


rehrev t1_it6lcll wrote

So you just don't think it's true and don't have an actual reason or explanation?


likeamanyfacedgod OP t1_it6ln21 wrote

you can even test it yourself by balancing and unbalancing your model.
Look at how it is calculated, the TPR and FPR are both fractions, so it
won't matter if one is a larger class than the other. What does matter
though is if you care more about predicting one class than the other.


rehrev t1_it6m93w wrote



likeamanyfacedgod OP t1_ithrjsb wrote

I gave you one, do you have an "actual" reason to back up why it is true or do you just troll posts without having anything intelligent to contribute?