hostilereplicator t1_itb8ghh wrote

TPR and FPR of a model are not independent given the decision threshold, which is what you’re varying to produce the ROC curve. As DigThatData said, you get a step function where each step is the point at which a sample crosses the threshold. If you get multiple threshold values where your FPR is the same but TPR changes, leading to a deep step in the curve, it means you haven’t got enough negative samples near to measure FPR precisely in that curve region.


hostilereplicator t1_it85f0v wrote

If you use precision, you also implicitly assume the data you're measuring on has the same positive:negative ratio as data you expect to see in the future (assuming you're going to deploy your model, rather than just doing retrospective analysis). FPR and TPR don't have this issue, so you can construct a test dataset with sufficiently large numbers of bot positives and negatives to get reliable measurements without worrying about the class imbalance.


hostilereplicator t1_it84dm4 wrote

Not really sure I understand your second paragraph. You can have a high absolute number of false positives with a tiny FPR only if you have a very high volume of negative samples. This isn't an issue with looking at the FPR, it's an issue with not knowing what FPR is acceptable to you for your particular application.

The ROC curve does not assume anything about your positive:negative ratio; the PR curve does, so if the ratio in your test set is different from your ratio in production (and often you don't know what the "true" ratio is in production), your precision measurements will be misleading.

A general difficulty with very low FPR or FNR measurement is lack of samples to measure on e.g. if you have 10_000 negatives and your FPR is 0.1%, you're only estimating your FPR on 10 samples, so the estimate will have high variance - but I think this issue would affect precision and recall measurements at the extremes as well, right?


hostilereplicator t1_it6n0dw wrote

The ROC curve plots true positive rate (TPR) against false positive rate (FPR) as you vary the decision threshold. TPR is measured on data labeled with the positive label, while FPR is measured on data labeled with the negative label. These numbers can therefore be measured independently: to measure TPR you only need positive samples and to measure FPR you only need negative samples. So it doesn't matter if you have an unbalanced number of positives and negatives to measure on.

The absolute number of samples of each type is more important, because this affects the uncertainty in your FPR and TPR measurements at each threshold setting. But the balance between number of positives and negatives is not relevant.

The opposite is true for precision-recall curves: recall is measured using only negative positive samples, but precision requires both positive and negative samples to measure. So the measurement of precision is dependent on the ratio of positives:negatives in your data.

The linked blog post references this paper in arguing for the use of precision-recall curves for imbalanced data, but this paper is about visual interpretation of the plots rather than what is "appropriate" in all cases, or whether the curves depend on the ratio or not.


hostilereplicator t1_isrzugb wrote

I would echo the others here and say that, depending on the focus of your paper, the big conferences do take maths/theory papers (NeurIPS, ICML, ICLR, COLT, also AIStats and UAI depending on your topic) and JMLR for longer papers. But all of the conferences are both very competitive and have a large random component in what gets accepted… it may also be worth looking at workshops at these conferences to see if anything fits better. Less “prestigious” but also easier to get into and more likely to be reviewed by a suitable/friendly referee.

What’s the topic of your research?