cantfindaname2take t1_iym5ipx wrote
Reply to comment by no_witty_username in [Discussion] - "data sourcing will be more important than model building in the era of foundational model fine-tuning" by fourcornerclub
Is it though? One thing that comes back up again is the comparison to human learning. Do humans get clean training samples? I like to think not that. Instead humans learn how to separate signal from noise much better, and also learn how to model hidden causes.
no_witty_username t1_iymgyhr wrote
Humans do get clean data when learning. Here is what bad data looks like for humans. Ocular degeneration, deafness, neurological disorder, etc.... Children who have various sensory deformities or diseases that cause damage to their sensory organs all have severe learning difficulties. Same goes with machines when they are presented shit data. The machines ability to understand anything is dependent on many factors, and some of the most important factors are presenting it with data it was built to process. Showing a machine a picture of a bad image crop of a person where the top half of said person is fully missing and the image displayed only neck down and telling it that's what a person is is bad data as much as showing an image to a child of anything with ocular degeneration . The image is severely distorted and while the brain of the child is quite capable of proper learning, its sensors aka the eyes are presenting shit data, so no proper learning will occurs.
Viewing a single comment thread. View all comments