junetwentyfirst2020 t1_j2xxhii wrote on January 4, 2023 at 6:50 PM

Reply to comment by groman434 in [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434

That’s not an easy question to answer because the 90% that are correct may be super easy to fit, and those 10% errors may just be unfittable and will just keep the loss high without impacting the model. On the other hand, since models tend to be very over-parameterized that 10% could very well be “fit” and have an outsized impact on the model. It could also be the case that the model ends up with 10% variance on its accuracy.

I’ve never seen a definitive theoretical answer since deep learning models are over parameterized and have seen models replicate the error in the training data, especially when it came to keypoint prediction. When I assessed the error in the training data, I showed the team that the model has the same degree of error. I was arguing for cleaner training data. I got told no and to come up with a magic solution to fix the problem. I quit 🤣