Viewing a single comment thread. View all comments

leondz t1_ix96sfz wrote

Yeah, this gives you an idea of how little of the data is actually worth going through - most of it repeats structures found elsewhere in the data, and isn't very diverse. Going through huge low-curation datasets is inefficient: the data diversity just isn't there.

1