PassionatePossum t1_jd7eado wrote on March 22, 2023 at 11:33 AM

Claims of 100% accuracy always sets off alarm bells.

I do work in the medical field and the problem is that there are lots of physicians who want to make easy money: Start a startup, collect some data (which is easy for them), download some model they have read about but don't really understand and start training.

I work for a medical device manufacturer and sometimes have to evaluate startups. And the errors they make are sometimes so basic that it becomes clear that they don't have the first clue what they are doing.

One of those startups claimed 99% accuracy on ultrasound images. But upon closer inspection their product was worthless. Apparently they know that they needed to split their data into training/validation/test set.

So what did they do? They took the videos and randomly assigned frames to one of these sets. And since two consecutive frames are very similar to each other, of course you are going to get 99% accuracy. It just means absolutely nothing.

Alternative_iggy t1_jd7fwg8 wrote on March 22, 2023 at 11:49 AM

So true - also I always think to the skin cancer detection model that turned out to predict anything with an arrow pointing to it to be cancer because all of the cancerous lesions in their training set had arrows. (Paper showing this ended up in JAMA)

PassionatePossum t1_jd7h9dr wrote on March 22, 2023 at 12:02 PM

:facepalm:

Yeah, that is exactly the level of mistakes I have to deal with.

Another classic that I see repeated over and over again is wildly unbalanced datasets: Some diseases are very rare, so for every sample of the disease you are looking for, there are 10000 or more samples that are normal. And often, they just throw it into a classifier and hope for the best.

And then you can also easily get 99% accuracy, but the only thing the classifier has learned, is to say "normal tissue", regardless of the input.

memberjan6 t1_jd7sa57 wrote on March 22, 2023 at 1:33 PM

A gpt might be engineered to rea papers and report findings of common basic errors in analysis design like you found.

Probability calibration could be added later via telemetry revealing its level of accuracy of its own basic error classification s.