Viewing a single comment thread. View all comments

hostilereplicator t1_it6n0dw wrote

The ROC curve plots true positive rate (TPR) against false positive rate (FPR) as you vary the decision threshold. TPR is measured on data labeled with the positive label, while FPR is measured on data labeled with the negative label. These numbers can therefore be measured independently: to measure TPR you only need positive samples and to measure FPR you only need negative samples. So it doesn't matter if you have an unbalanced number of positives and negatives to measure on.

The absolute number of samples of each type is more important, because this affects the uncertainty in your FPR and TPR measurements at each threshold setting. But the balance between number of positives and negatives is not relevant.

The opposite is true for precision-recall curves: recall is measured using only negative positive samples, but precision requires both positive and negative samples to measure. So the measurement of precision is dependent on the ratio of positives:negatives in your data.

The linked blog post references this paper in arguing for the use of precision-recall curves for imbalanced data, but this paper is about visual interpretation of the plots rather than what is "appropriate" in all cases, or whether the curves depend on the ratio or not.


respeckKnuckles t1_it89f8r wrote

Something I never quite understood---TPR and FPR are independent of each other, right? So then how is the plot of the AUC-ROC curve created? What if there are multiple parameters for which the FPR is the same value, but the TPR differs?


DigThatData t1_it8hvbj wrote

each point on the curve represents a decision threshold. given a particular decision threshold, your model will classify points a certain way. as you increment the threshold, it will hit the score of one or more observations, creating a step function as observations are moved from one bin to another as the decision threshold moves across their score.


respeckKnuckles t1_it8ovi2 wrote

Is there a reason then that it's not common to see what the actual threshold is on graphs of AUC-ROC curves? It seems like it would be very helpful to have a little mark on the curve itself for when the threshold is 0.5, for example.


Professional_Pay_806 t1_ita6elf wrote

The threshold isn't important for what the ROC curve is trying to show. You can think about the ROC curve as representing a range of thresholds from the point where all samples are classified as negative (TPR of 0 and FPR of 0), and the point where all samples are classified as positive (TPR of 1 and FPR of 1). The space between is what matters. For a robust classifier, the true positive rate will rise significantly faster than the false positive rate. So a steep slope at the beginning approaching 1 while FPR is still low (which tends to AUC of 1) means the classifier is robust. The closer the AUC is to 1/2 (represented by the diagonal connecting bottom left to top right), the closer the classifier is to effectively tossing a coin and guessing positive if you get heads. It's not about what the specific threshold is, it's about how well-separated the data clusters are in the feature space where the threshold is being used. Thinking about a threshold as typically being 0.5 (because you're just looking for a maximum likelihood of correct classification in a softmax layer or something) is thinking about one very specific type of classifier. The ROC curve is meant to be showing something more generally applicable to any classifier in any feature space.


Professional_Pay_806 t1_ita6rxs wrote

Note you could always perform a linear transformation on your classification layer that shifts your threshold to another arbitrary value with the exact same results, but the ROC curve will remain the same as it was before.


DigThatData t1_it8rc38 wrote

that's a variant that people definitely do sometimes. If you think adding score annotations a particular way should be an out-of-the-box feature in a particular tool you use, you should create an issue on their gh to recommend it or implement it yourself and submit a PR.


hostilereplicator t1_itb8ghh wrote

TPR and FPR of a model are not independent given the decision threshold, which is what you’re varying to produce the ROC curve. As DigThatData said, you get a step function where each step is the point at which a sample crosses the threshold. If you get multiple threshold values where your FPR is the same but TPR changes, leading to a deep step in the curve, it means you haven’t got enough negative samples near to measure FPR precisely in that curve region.