Alternative_iggy t1_jd7fwg8 wrote on March 22, 2023 at 11:49 AM

Reply to comment by PassionatePossum in [D] 100% accuracy of Random Forest Breast Cancer Prediction by [deleted]

So true - also I always think to the skin cancer detection model that turned out to predict anything with an arrow pointing to it to be cancer because all of the cancerous lesions in their training set had arrows. (Paper showing this ended up in JAMA)

Alternative_iggy t1_jd7bgm4 wrote on March 22, 2023 at 11:02 AM

Reply to [D] 100% accuracy of Random Forest Breast Cancer Prediction by [deleted]

I don’t typically deal in breast cancer histopathology models but I do work with medical imaging full time as my day job - if I’m reading this correctly they use the Wisconsin Breast Cancer dataset (originally released in 1995!: https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(diagnostic))

First question - have breast cancer histopathology evaluation techniques changed since 1995? Checking out a quick lit review - yes: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8642363/#Sec2

So is this dataset likely to be useful today? Well… we don’t know the demographics of the population, we don’t know the split of severity of tumors in the population (this could be all easy cancers and not very generalizable/ useful to what someone sees on a day to day!), and the preprocessing required would need someone to take the digital image and extract all these features which honestly probably takes the same amount of time as the pathologist looking at the image and evaluating it. Also it sort of looks like they just used the features that came with the dataset…

They report the 100% accuracy on the training set and 99% on the testing set - great, theoretically any model can get to 100% accuracy on the training set so I almost always ignore this completely when papers do this unless there is a substantial drop off between training and testing or vice versa. But next question - are these results in line with similar published results on this particular dataset? Here’s an ARXIV paper from 2019 with similar results: https://arxiv.org/pdf/1902.03825.pdf

So nothing new here… it seems it’s possible and has been previously published to get 99% accuracy on this dataset…

Next question - is procedia a good journal? It publishes conference paper proceedings with an impact factor of 0.8 (kind of low). It’s unlikely this hit a rigorous peer review process, although I don’t like to throw our conference journals just because some of the big cool clinical trial results and huge breakthroughs are dumped in places like there. But in this case it seems like two researchers trying to get a paper out and not necessarily a ground breaking discovery (people have published on this dataset before and gotten 99% with random forest before!).

Final conclusion: meh.

Alternative_iggy t1_jd1n6zz wrote on March 21, 2023 at 4:27 AM

Reply to comment by kau_mad in [D] For those who have worked 5+ years in the field, what are you up to now? by NoSeaweed8543

University! Although I had a fellowship from a more industrial place for funding at first Finding funding is always the real tricky question!

Alternative_iggy t1_jd1lr16 wrote on March 21, 2023 at 4:12 AM

Reply to comment by eigenham in [D] For those who have worked 5+ years in the field, what are you up to now? by NoSeaweed8543

Yep! I reached out to get a volunteer appointment with a lab I liked and made sure I had it ok’d in my work contract. I also used the mandatory continuing education credits the company had to take some grad classes and stayed part time when I first hopped back.

Alternative_iggy t1_jcxrabk wrote on March 20, 2023 at 11:27 AM

Reply to [D] For those who have worked 5+ years in the field, what are you up to now? by NoSeaweed8543

I guess I’m at year 10+ now. In the last few years I’ve switched back to academia/ research!

I started out in research where I was happy but made no money, switched to a startup where I made a little more money but often got bored of the problems, when a startup got bought by a bigger company found myself working on sometimes WAY cooler problems but had to deal with a lot of bureaucracy, moved up in management roles for a bit, and now hopped back to straight up research where I’m incredibly happy and have a lot more flexibility.