sayoonarachu t1_j2xypar wrote on January 4, 2023 at 6:58 PM

Reply to comment by groman434 in [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434

Generally, it is a good idea to split your data into a training, validation, and testing set. Something like 80/10/10 or 80/20 depending on how much data you're feeding a neural network (NN).

So, 80% of the data, randomly selected, would be used to train an NN, and with, say, every epoch or batch if using batch normalization, it would validate against what it has "learn."

Once you're happy with said model performance, then you can use the test data set to see how well your model performs to "new" data in the sense that the 10% you set aside for testing was never introduced to the model during training.

Of course, there are many, many other methods to minimize loss, performance, etc. But, even if your network was "perfect," if the person building it didn't spend the time to "clean" the data, then no matter what it will always have some higher degree of error.

Or something like that. I'm just a fledgling when it comes to deep learning.