Submitted by likeamanyfacedgod t3_y9n120 in MachineLearning

Are there any actually good blogs on machine learning that are actually accurate? It's amazing how the first google hits can be total trash. take this article for example:

This is full of nonsense - "ROC curves are appropriate when the observations are balanced between each class, whereas precision-recall curves are appropriate for imbalanced datasets"

Anyone who has actually worked with ML will know that the above statement is blatantly not true! it's amazing that such a blog manages to come up so often in google hits...

Is there anything better out there?



You must log in or register to comment.

acdjent t1_it6fkab wrote

I like this one: But it really depends on what you want to read


Chefbook t1_it7ya1z wrote

If you’re a fan of Lilian weng blog, also check out Brian Keng’s blog and Eric jang’s blog . All of them helped me understand probabilistic generative models when there wasn’t much written about them


BrotherAmazing t1_it935fx wrote

The Stanford YT-posted lectures from TAs who helped Fei-Fei like Justin Johnson and Serena Yeung are great too, but those aren’t “blogs” so much as lectures. Justin’s UMich web page has some cool stuff on it.


BrotherAmazing t1_it92qjb wrote

Lulz I just posted that as an example of a good blog without having scrolled down to see your response until now. 😆


just__uncreative t1_it713u8 wrote

Disagree. The above statement from the blog is true.

When you have a large class imbalance skewed negative, the FPR is not very informative because it is not sensitive enough to false positives.

The definition of FPR is FP/(FP+TN). When TN is massive because of class imbalance, your model can be predicting many false positives and the FPR can still be tiny, giving you an overly rosy view of your performance and roc curves/auc that look great, when in reality your model is over predicting the positive class like crazy.

Precision doesn’t have this problem, and so PR is better.

I have worked on real applications where this has come in to play and made a huge difference because in these class imbalanced problems, the positive class is usually what you’re looking for. So if you use roc for model selection you end up flooding your predictions with FPs and it noises up the application significantly.


BobDope t1_it9cym1 wrote

You are correct. Agree there are tons of trash blogs but the machine learning mastery dude is legit.


hostilereplicator t1_it84dm4 wrote

Not really sure I understand your second paragraph. You can have a high absolute number of false positives with a tiny FPR only if you have a very high volume of negative samples. This isn't an issue with looking at the FPR, it's an issue with not knowing what FPR is acceptable to you for your particular application.

The ROC curve does not assume anything about your positive:negative ratio; the PR curve does, so if the ratio in your test set is different from your ratio in production (and often you don't know what the "true" ratio is in production), your precision measurements will be misleading.

A general difficulty with very low FPR or FNR measurement is lack of samples to measure on e.g. if you have 10_000 negatives and your FPR is 0.1%, you're only estimating your FPR on 10 samples, so the estimate will have high variance - but I think this issue would affect precision and recall measurements at the extremes as well, right?


robbsc t1_it7zqsh wrote

One of the main reasons to use a ROC curve is for imbalanced (usually binary) datasets. A more intuitive way to look at FPR is FP/N. The curve tells you the fraction of false positives you are going to pass through for any given TPR (recall, sensitivity). If the fpr you care about is tiny, you can focus on the left side of the curve and ignore the right side.

It's also useful to sample the roc curve at recalls you care about. e.g., how many false positives am i passing through for a TPR of 95%?

Lastly, in my experience, AUC correlates highly with an improved model because most of the right side of the curve doesn't tend to change much and sits close to 1 in situations where you're just trying to improve the left side of the curve. If it doesn't, then you probably just need to change the number of thresholds you're sampling when computing auc.

Whether to use roc or precision-recall depends more on the type of problem you're working on. Obviously precision-recall is better for information retrieval, because you care about what fraction of the information retrieved at a given threshold is useful. Roc is better if you care highly about the raw number of false positives you're letting through.


hostilereplicator t1_it85f0v wrote

If you use precision, you also implicitly assume the data you're measuring on has the same positive:negative ratio as data you expect to see in the future (assuming you're going to deploy your model, rather than just doing retrospective analysis). FPR and TPR don't have this issue, so you can construct a test dataset with sufficiently large numbers of bot positives and negatives to get reliable measurements without worrying about the class imbalance.


robbsc t1_it87etm wrote

Good point. The only valid criticism of roc curves that i can think of is that you can't always visually compare 2 full ROC curves without "zooming in" to the part you care about.


rehrev t1_itihgn8 wrote

I am having trouble understanding this. How is your model overpredicting positive class but your true negative is huge compared to your false positive?

What do you mean by overpredicting positive class if you don't mean high FP compared to TN?


KingsmanVince t1_it6fskg wrote

Can you explain why the statement is not true? It maybe trivial to you but isn't to some others.


hostilereplicator t1_it6n0dw wrote

The ROC curve plots true positive rate (TPR) against false positive rate (FPR) as you vary the decision threshold. TPR is measured on data labeled with the positive label, while FPR is measured on data labeled with the negative label. These numbers can therefore be measured independently: to measure TPR you only need positive samples and to measure FPR you only need negative samples. So it doesn't matter if you have an unbalanced number of positives and negatives to measure on.

The absolute number of samples of each type is more important, because this affects the uncertainty in your FPR and TPR measurements at each threshold setting. But the balance between number of positives and negatives is not relevant.

The opposite is true for precision-recall curves: recall is measured using only negative positive samples, but precision requires both positive and negative samples to measure. So the measurement of precision is dependent on the ratio of positives:negatives in your data.

The linked blog post references this paper in arguing for the use of precision-recall curves for imbalanced data, but this paper is about visual interpretation of the plots rather than what is "appropriate" in all cases, or whether the curves depend on the ratio or not.


respeckKnuckles t1_it89f8r wrote

Something I never quite understood---TPR and FPR are independent of each other, right? So then how is the plot of the AUC-ROC curve created? What if there are multiple parameters for which the FPR is the same value, but the TPR differs?


DigThatData t1_it8hvbj wrote

each point on the curve represents a decision threshold. given a particular decision threshold, your model will classify points a certain way. as you increment the threshold, it will hit the score of one or more observations, creating a step function as observations are moved from one bin to another as the decision threshold moves across their score.


respeckKnuckles t1_it8ovi2 wrote

Is there a reason then that it's not common to see what the actual threshold is on graphs of AUC-ROC curves? It seems like it would be very helpful to have a little mark on the curve itself for when the threshold is 0.5, for example.


Professional_Pay_806 t1_ita6elf wrote

The threshold isn't important for what the ROC curve is trying to show. You can think about the ROC curve as representing a range of thresholds from the point where all samples are classified as negative (TPR of 0 and FPR of 0), and the point where all samples are classified as positive (TPR of 1 and FPR of 1). The space between is what matters. For a robust classifier, the true positive rate will rise significantly faster than the false positive rate. So a steep slope at the beginning approaching 1 while FPR is still low (which tends to AUC of 1) means the classifier is robust. The closer the AUC is to 1/2 (represented by the diagonal connecting bottom left to top right), the closer the classifier is to effectively tossing a coin and guessing positive if you get heads. It's not about what the specific threshold is, it's about how well-separated the data clusters are in the feature space where the threshold is being used. Thinking about a threshold as typically being 0.5 (because you're just looking for a maximum likelihood of correct classification in a softmax layer or something) is thinking about one very specific type of classifier. The ROC curve is meant to be showing something more generally applicable to any classifier in any feature space.


Professional_Pay_806 t1_ita6rxs wrote

Note you could always perform a linear transformation on your classification layer that shifts your threshold to another arbitrary value with the exact same results, but the ROC curve will remain the same as it was before.


DigThatData t1_it8rc38 wrote

that's a variant that people definitely do sometimes. If you think adding score annotations a particular way should be an out-of-the-box feature in a particular tool you use, you should create an issue on their gh to recommend it or implement it yourself and submit a PR.


hostilereplicator t1_itb8ghh wrote

TPR and FPR of a model are not independent given the decision threshold, which is what you’re varying to produce the ROC curve. As DigThatData said, you get a step function where each step is the point at which a sample crosses the threshold. If you get multiple threshold values where your FPR is the same but TPR changes, leading to a deep step in the curve, it means you haven’t got enough negative samples near to measure FPR precisely in that curve region.


likeamanyfacedgod OP t1_it6gqnz wrote

It's not trivial to me at all. I've seen a few blog posts that make this statement, but from my own experience, it's not true, you can even test it yourself by balancing and unbalancing your model. Look at how it is calculated, the TPR and FPR are both fractions, so it won't matter if one is a larger class than the other. What does matter though is if you care more about predicting one class than the other.


PassionatePossum t1_it6ms0v wrote

>the TPR and FPR are both fractions, so it won't matter if one is a larger class than the other.

In most cases that is a desirable property. You don't want to have excellent results just because one class makes up 99% of your dataset and the classifier just predicts the most common class without learning anything. Precision and Recall are also fractions.

The difference between ROC and Precision/Recall is that ROC needs the concept of a "negative class". That can be problematic for multi-class problems. Even if your data is perfectly balanced across all of your classes, the negative class (i.e. all classes that aren't the class you are examining) is bound to be overrepresented.

Since you only need the positive examples for a precision/recall plot you don't have that problem.

So, I don't have a problem with the statement that ROC is appropiate for a balanced dataset (provided that we have a binary classification problem or the number of different classes is at least low).


madrury83 t1_it7hw31 wrote

I think the more rigorous way to get at the OPs point is to observe that the AUC is the probability that a randomly selected positive class is scored higher (by your fixed model) than a randomly chosen negative class. Being probabilities, these are independent (at a population level) of the number of samples you have from your positive and negative populations (of course, smaller samples get you more sampling variance). I believe this is the OPs point with "they are fractions".

In any case, can we at least all agree that blogs/articles throwing around this kind of advice without justification is less than helpful?


rehrev t1_it6lcll wrote

So you just don't think it's true and don't have an actual reason or explanation?


likeamanyfacedgod OP t1_it6ln21 wrote

you can even test it yourself by balancing and unbalancing your model.
Look at how it is calculated, the TPR and FPR are both fractions, so it
won't matter if one is a larger class than the other. What does matter
though is if you care more about predicting one class than the other.


rehrev t1_it6m93w wrote



likeamanyfacedgod OP t1_ithrjsb wrote

I gave you one, do you have an "actual" reason to back up why it is true or do you just troll posts without having anything intelligent to contribute?


TiredOldCrow t1_it7kkg2 wrote

Sympathetic to the complaint, but don't know if you've linked a good example here.

I'll go to bat for Jason Brownlee. He's provided some really excellent hands-on tutorials over the years, repeatedly updates his blogs based on feedback, and overall has made the field much more accessible.


philwinder t1_it8zimx wrote

I agree. It's easy to cherry pick issues. But when you create that much content, over such a long period of time, it's impressive.


lqstuart t1_it7qvpv wrote

Jay Alammar -

Papers with code -

Distill although they quit last year -

And Chris Olah although it's pretty much all on Distill -

That said, a lot of the most interesting stuff comes from younger people who start out writing to improve their own understanding, and are still at a stage in their careers where they get to actually be hands-on with interesting stuff. If you limit yourself only to "distinguished" engineers from major tech companies, it's generally going to be boring as shit.

You can also look up research/engineering blogs from Huggingface, FAIR, Google, LinkedIn, Pinterest, Uber, and anywhere else that has a strong open source culture. Just pick a few places that do something you're interested in.


BrotherAmazing t1_it929ik wrote

Yes, don’t look at blogs that are basically .com “journalism for hire” crap where people are simply trying to make a buck as a “side gig” unless the author is mildly famous/respected, and look at blogs like this from someone who is an actual researcher with not just a PhD but who has experience with a job in AI/ML:

This is just one example. There are many more out there that are good blogs that are mostly accurate (everyone is entitled to a mistake once in a while).


Alexsander787 t1_it95t8d wrote

I think Jason's blog is great, despite the issue you pointed out. Maybe he just creates so much content that it becomes hard to be right all the time, and this topic in particular has some nuance to ir. I think its an understandable mistake and you shouldn't count off good work on the basis of that --- his posts sure helped me lesrn a lot and are my go to source. I'm always happy when googling a particular topic if his website comes up.

That being said, someone did comment on his post a similar point to the one you're making - and in a humble and nice way, even - but he dismissed it without any explanation, and that was weird.


BobDope t1_it9dedn wrote

It wasn’t even particularly an issue


antiquemule t1_it8ybll wrote

I enjoy watching Yannick Kilcher. He has great walk-throughs of key papers as well as a bunch of other stuff.


tfburns t1_it7c7s9 wrote

A lot of popular ML content (like ML papers/marketing) is hot air. It's sort of a systemic issue.


Gere1 t1_itbcbfq wrote

Machinelearningmastery is rather shallow, but he tries to spread and monetize it aggressively. And searches will pick up on this.

But if you believe P-R-curves are bad for imbalanced data, then you are just as mistaken. For example, precision and recall is exactly what you need for fraud detection. ML isn't about opinions and hand-wavy reasoning about math, but about getting results which work in the real world.

Now, you are asking for blogs specifically. What type information would you like to see? Learning the basics like ROC curves so probably better done from books or practice instead of waiting for blog posts. For more research level information there are many blogs, but it depends on the field (CV, NLP, ...). For a regular overview over what's happening you could look into or


likeamanyfacedgod OP t1_ithsgdh wrote

Thanks! FYI, I believe that PR curves are good for imbalanced data. My beef is that I also believe that ROC curves are good for imbalanced data.


Gere1 t1_ithw25k wrote

I agree. If anything, then ROC curves have some "academic" reasons to be rather good than bad for imbalanced data.

I think there are a lot of low quality data science blog posts out there. In the end only something with measurable success (like an ML competition winner) indicates something worth looking into.


RedditRabbitRobot t1_ita9wp3 wrote

hmm not exactly a blogv not exclusivelt ML, not deep enough for use in the DS industry

But this is reaaaally nice if you wanna get into the math part.


MLRecipes t1_itbnudj wrote

The problem is using Google to find anything other than stuff for the general public: you will get crap, and only crap. You are better off using the Reddit or StackOverflow search box. Or google "machine learning Quora" or "machine learning reddit".

Feel free to check out my own blog here, and I hope that you can find the high quality you are looking for (at least, things that make sense, and a lot of originality).


beerus1729 t1_itb18as wrote

towards data science . This site is good. But there is limitation on the number of articles you can read in a month without paying money.


smurfpiss t1_it82yz2 wrote

>Anyone who has actually worked with ML will know that the above statement is blatantly not true! it's amazing that such a blog manages to come up so often in google hits...

You do not use ROC curves with imbalanced classes. It's concerning that you think that is wrong.

Thank you for coming to my Ted talk.


BobDope t1_it9dhr1 wrote

You can but the utility is limited