Viewing a single comment thread. View all comments

oilfee t1_j2o8mba wrote

I'm interested in numbers, not "it depends". How much data in bytes or tokens would I need for

- text generation

- image generation

- sound generation

- function classes

- protein sequences

- chess games

to achieve some sort of saturation of learnability, like diminishing return for a given architecture? Is it the same ball park? Have different data set sizes been compared with different model sizes?

1

v2thegreat t1_j2oablu wrote

For transformers that's likely a difficult question to answer without experimentation, but I always recommend to start small. It's generally hard enough to go from 0 to 1 without also worrying about scaling things up.

Currently, we're seeing that larger and larger models aren't really slowing down and continue to become more powerful.

I'd say that this deserves it's own post rather than a simple question.

Good luck and please respond when you end up solving it!

1