Submitted by jacobgil t3_11orezx in MachineLearning

https://github.com/jacobgil/confidenceinterval

pip install confidenceinterval

tldr: You don't have an excuse anymore to not use confidence intervals !

​

In statistics, confidence intervals are commonly reported along accuracy metrics to help interpret them.

For example, an AUC metric might be 0.9 but if the 95% confidence interval is in the range [0.7, 0.96], we can't confidently say we didn't just get lucky - we should be really careful making decisions around that result.

More formally, a confidence interval gives us a range on where the true unknown accuracy metric could be, and a 95% confidence interval means that if we would repeat the experiment many times, 95% of the confidence-intervals we reported would have the actual true metric (which is unknown) inside them - coverage.

Confidence intervals are usually computed analytically, by making some assumptions about the metric distribution and using the central limit theorem,or by using bootstrapping - resampling the results again and again, computing the metric, and checking the resulting distribution.

However, in the python data science world, I rarely saw these being used. I guess part of the reason is the culture, where many data science practitioners don't come from the statistics world. But I think the main reason is that there aren't easy to use libraries that do this. While in the R language there is fantastic support for confidence intervals, for python there are mostly scattered pieces of code and blog posts.

​

The confidenceinterval package keeps the clean and popular scikit-learn metric API,

e.g roc_auc_score(y_true, y_pred), but also returns confidence intervals.

It supports analytical computations for many methods (including AUC with the delong method, or F1 with macro, micro averaging, following the recent results from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8936911/#APP2, or binary proportions like the TPR using binomial CI methods like the wilson interval).

It can be easily switched to using bootstrapping (with several supported bootstrapping methods),

and also gives you a way to easily compute the confidence interval for any metric with bootstrapping.

118

Comments

You must log in or register to comment.

Valuable-Kick7312 t1_jbuwppx wrote

Cool! This always assume that the data is drawn iid?

16

jacobgil OP t1_jc2h94t wrote

Yes. I think confidence intervals assume iid. If they are not iid, then the CI could be too short.

1

Valuable-Kick7312 t1_jc2ziwy wrote

Thank you for the answer!

Just a few notes: In general, confidence intervals do not assume iid. Moreover, in theory, if the data is not drawn iid then CI can also be smaller. However, I have not encountered this in practice yet.

2

jonnyyen t1_jbvhdvn wrote

Nice to see a python implementation of deLong's method - I've had to use pROC (in R) for that in the past. For binary event analysis (among other things) there's also https://github.com/drsteve/PyForecastTools, which also has bootstrapped confidence intervals, or analytic CI using Wald or Agresti-Coull. The terminology is from weather literature, but it covers a lot of the same ground.

5

francozzz t1_jbvhf9n wrote

I’ve just been asked to use confidence intervals for a project I’m working at, this comes as a godsend! Thanks!

4

Balance- t1_jc16bi6 wrote

Looks awesome!

I would also post at r/Python and/or r/DataScience

2

jacobgil OP t1_jc2heig wrote

Thanks! Following your suggestion I posted to r/DataScience

1