No_Ninja3309_NoNoYes t1_ja2w92l wrote on February 26, 2023 at 1:09 PM

I haven't read the paper, but my friend Fred says that they used a simple model to decide what goes into the training data. That would explain the 10x smaller size. Or one of us misunderstood. I mean, you could download the data in theory and grep for whatever you are interested in. Let's say psychology. Then get the code and GPUs in the cloud. You can crowdfund this if there's enough interest. I guess the more niche topics would be also the cheapest to do.

AylaDoesntLikeYou OP t1_ja2wnih wrote on February 26, 2023 at 1:13 PM

You and Fred must talk a lot. Lol