leondz t1_ix96sfz wrote
Reply to comment by ReasonablyBadass in [R] Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning - Epochai Pablo Villalobos et al - Trend of ever-growing ML models might slow down if data efficiency is not drastically improved! by Singularian2501
Yeah, this gives you an idea of how little of the data is actually worth going through - most of it repeats structures found elsewhere in the data, and isn't very diverse. Going through huge low-curation datasets is inefficient: the data diversity just isn't there.
Viewing a single comment thread. View all comments