Viewing a single comment thread. View all comments

No_Ninja3309_NoNoYes t1_ja2w92l wrote

I haven't read the paper, but my friend Fred says that they used a simple model to decide what goes into the training data. That would explain the 10x smaller size. Or one of us misunderstood. I mean, you could download the data in theory and grep for whatever you are interested in. Let's say psychology. Then get the code and GPUs in the cloud. You can crowdfund this if there's enough interest. I guess the more niche topics would be also the cheapest to do.

2