groman434 OP t1_j2x82ze wrote on January 4, 2023 at 4:13 PM

Reply to comment by IntelArtiGen in [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434

>But we don't design or train our models to exactly reproduce what a human did, that would be a risk of overfitting, so even by reproducing humans a model can do better and not reproduce some mistakes.

Can you please elaborate on this? Let's say your train data contains 10% of errors. Can you train a model that it would be more than 90% accurate? If yes, why?

Edit: My guess would be that the model during the training phase, can "find out" what are features typical for cats provided that the training set is "good enough". So even if the set contains some errors, they will not impact significantly a prediction the model can give.

IntelArtiGen t1_j2xa49x wrote on January 4, 2023 at 4:26 PM

I can give another example. Input / Output: 1.7/0, 2/0, 2.2/1 ,3.5/0 ,4/0 ,5/0 ,8/0 ,9.6/0 ,11/1, 13/1, 14/1, 16/1, 18/1, 20/1. There is an error in this dataset: 2.2/1. But you can train a model on this set to predict 2.2/0 (a small / regularized model would do that) . You could also train a model to predict 1 for 2.2, but it would probably be overfitting. The same idea applies to any concept in input and any concept in output.