https://github.com/jacobgil/confidenceinterval

pip install confidenceinterval

tldr: You don't have an excuse anymore to not use confidence intervals !

In statistics, confidence intervals are commonly reported along accuracy metrics to help interpret them.

For example, an AUC metric might be 0.9 but if the 95% confidence interval is in the range [0.7, 0.96], we can't confidently say we didn't just get lucky - we should be really careful making decisions around that result.

More formally, a confidence interval gives us a range on where the true unknown accuracy metric could be, and a 95% confidence interval means that if we would repeat the experiment many times, 95% of the confidence-intervals we reported would have the actual true metric (which is unknown) inside them - coverage.

Confidence intervals are usually computed analytically, by making some assumptions about the metric distribution and using the central limit theorem,or by using bootstrapping - resampling the results again and again, computing the metric, and checking the resulting distribution.

However, in the python data science world, I rarely saw these being used. I guess part of the reason is the culture, where many data science practitioners don't come from the statistics world. But I think the main reason is that there aren't easy to use libraries that do this. While in the R language there is fantastic support for confidence intervals, for python there are mostly scattered pieces of code and blog posts.

The confidenceinterval package keeps the clean and popular scikit-learn metric API,

e.g roc_auc_score(y_true, y_pred), but also returns confidence intervals.

It supports analytical computations for many methods (including AUC with the delong method, or F1 with macro, micro averaging, following the recent results from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8936911/#APP2, or binary proportions like the TPR using binomial CI methods like the wilson interval).

It can be easily switched to using bootstrapping (with several supported bootstrapping methods),

and also gives you a way to easily compute the confidence interval for any metric with bootstrapping.

Comments

[deleted] t1_jbu1ss7 wrote on March 11, 2023 at 6:38 PM

#2,209,666

[removed]

fastglow t1_jbupkr9 wrote on March 11, 2023 at 9:30 PM

#2,210,525

Very cool. Thanks for making this.

Kaleidophon t1_jbus2rm wrote on March 11, 2023 at 9:48 PM

#2,210,619

Very neat! I will add this to https://github.com/Kaleidophon/experimental-standards-deep-learning-research :-) Maybe you can also add citation info in case people want to refer to the package in their publication?

Valuable-Kick7312 t1_jbuwppx wrote on March 11, 2023 at 10:24 PM

#2,210,814

Cool! This always assume that the data is drawn iid?

jonnyyen t1_jbvhdvn wrote on March 12, 2023 at 1:05 AM

#2,211,619

Nice to see a python implementation of deLong's method - I've had to use pROC (in R) for that in the past. For binary event analysis (among other things) there's also https://github.com/drsteve/PyForecastTools, which also has bootstrapped confidence intervals, or analytic CI using Wald or Agresti-Coull. The terminology is from weather literature, but it covers a lot of the same ground.

francozzz t1_jbvhf9n wrote on March 12, 2023 at 1:05 AM

#2,211,621

I’ve just been asked to use confidence intervals for a project I’m working at, this comes as a godsend! Thanks!

mfarahmand98 t1_jbx1hgt wrote on March 12, 2023 at 11:16 AM

#2,213,375

Awesome library! Been looking for something like this.

blablanonymous t1_jbxh137 wrote on March 12, 2023 at 2:05 PM

#2,213,926

⭐️

Balance- t1_jc16bi6 wrote on March 13, 2023 at 8:05 AM

#2,219,207

Looks awesome!

I would also post at r/Python and/or r/DataScience

jacobgil OP t1_jc2h94t wrote on March 13, 2023 at 3:47 PM

#2,220,896

Replying to Valuable-Kick7312 (#2,210,814)

Yes. I think confidence intervals assume iid. If they are not iid, then the CI could be too short.

jacobgil OP t1_jc2hal0 wrote on March 13, 2023 at 3:48 PM

#2,220,900

Replying to jonnyyen (#2,211,619)

Cool!

jacobgil OP t1_jc2hb5f wrote on March 13, 2023 at 3:48 PM

#2,220,901

Replying to fastglow (#2,210,525)

Thanks!

jacobgil OP t1_jc2hboz wrote on March 13, 2023 at 3:48 PM

#2,220,903

Replying to francozzz (#2,211,621)

Thanks!

jacobgil OP t1_jc2hc8h wrote on March 13, 2023 at 3:48 PM

#2,220,904

Replying to mfarahmand98 (#2,213,375)

Thanks!

jacobgil OP t1_jc2hcov wrote on March 13, 2023 at 3:48 PM

#2,220,905

Replying to blablanonymous (#2,213,926)

Thanks!

jacobgil OP t1_jc2heig wrote on March 13, 2023 at 3:48 PM

#2,220,909

Replying to Balance- (#2,219,207)

Thanks! Following your suggestion I posted to r/DataScience

jacobgil OP t1_jc2hgbr wrote on March 13, 2023 at 3:49 PM

#2,220,913

Replying to Kaleidophon (#2,210,619)

Thanks! Will add citation info there!

Valuable-Kick7312 t1_jc2ziwy wrote on March 13, 2023 at 5:47 PM

#2,221,639

Replying to jacobgil (#2,220,896)

Thank you for the answer!

Just a few notes: In general, confidence intervals do not assume iid. Moreover, in theory, if the data is not drawn iid then CI can also be smaller. However, I have not encountered this in practice yet.