oilfee

oilfee t1_j2o8mba wrote

I'm interested in numbers, not "it depends". How much data in bytes or tokens would I need for

- text generation

- image generation

- sound generation

- function classes

- protein sequences

- chess games

to achieve some sort of saturation of learnability, like diminishing return for a given architecture? Is it the same ball park? Have different data set sizes been compared with different model sizes?

1

oilfee t1_j2mki8y wrote

How much data do I need for a transformer model? If I'm not mistaken, GPT-3 uses something like 50 PB of text? But maybe it gets 'decent' result with much less? I just don't want to fall into the trap of the small business owner who hires a data scientist and asks her to use deep learning for their 130-entry customer data base (which I've encountered before). But like, 1M tokens? 10M?

1