Alternative_iggy t1_jd7fwg8 wrote on March 22, 2023 at 11:49 AM

Reply to comment by PassionatePossum in [D] 100% accuracy of Random Forest Breast Cancer Prediction by [deleted]

So true - also I always think to the skin cancer detection model that turned out to predict anything with an arrow pointing to it to be cancer because all of the cancerous lesions in their training set had arrows. (Paper showing this ended up in JAMA)

PassionatePossum t1_jd7h9dr wrote on March 22, 2023 at 12:02 PM

:facepalm:

Yeah, that is exactly the level of mistakes I have to deal with.

Another classic that I see repeated over and over again is wildly unbalanced datasets: Some diseases are very rare, so for every sample of the disease you are looking for, there are 10000 or more samples that are normal. And often, they just throw it into a classifier and hope for the best.

And then you can also easily get 99% accuracy, but the only thing the classifier has learned, is to say "normal tissue", regardless of the input.

memberjan6 t1_jd7sa57 wrote on March 22, 2023 at 1:33 PM

A gpt might be engineered to rea papers and report findings of common basic errors in analysis design like you found.

Probability calibration could be added later via telemetry revealing its level of accuracy of its own basic error classification s.