PassionatePossum t1_jd7h9dr wrote on March 22, 2023 at 12:02 PM

Reply to comment by Alternative_iggy in [D] 100% accuracy of Random Forest Breast Cancer Prediction by [deleted]

:facepalm:

Yeah, that is exactly the level of mistakes I have to deal with.

Another classic that I see repeated over and over again is wildly unbalanced datasets: Some diseases are very rare, so for every sample of the disease you are looking for, there are 10000 or more samples that are normal. And often, they just throw it into a classifier and hope for the best.

And then you can also easily get 99% accuracy, but the only thing the classifier has learned, is to say "normal tissue", regardless of the input.

memberjan6 t1_jd7sa57 wrote on March 22, 2023 at 1:33 PM

A gpt might be engineered to rea papers and report findings of common basic errors in analysis design like you found.

Probability calibration could be added later via telemetry revealing its level of accuracy of its own basic error classification s.