Submitted by Fun_Country_4193 t3_za7d1t in MachineLearning
jobeta t1_iyl87di wrote
What kind of data is it?
Fun_Country_4193 OP t1_iyl8an7 wrote
all text data, consists of data from the pile and some other datasets, it's about 1TB total, but you can train on randomly pulled batches from the overall set (about 2GB), which works about as well as trying to train on the whole dataset
jobeta t1_iyl8mib wrote
« Data from the pile »? Why don’t you organize a Kaggle challenge ?
Fun_Country_4193 OP t1_iyl90ge wrote
good idea, thanks!
Fun_Country_4193 OP t1_iyl9s1d wrote
I just checked, and minimum cost is 50,000. I could probably do like 20k, but 50k is a lot.
jobeta t1_iym2upg wrote
Oh ok. I guess they have some costs on their end too. What did you mean by data from the pile? I’m happy to give it a shot if you think ~1 GB of data can be enough.
Viewing a single comment thread. View all comments