Viewing a single comment thread. View all comments

gamerx88 t1_jdmql8y wrote

Answer is probably not. DeepMind's Chinchilla paper shows that many of those 100B+ LLMs are oversized for the amount of data used to pre-train them.

3

currentscurrents t1_jdmzphs wrote

That's true, but only for the given compute budget used in training.

Right now we're really limited by compute power, while training data is cheap. Chinchilla and LLaMA are intentionally trading more data for less compute. Larger models still perform better than smaller ones given the same amount of data.

In the long run I expect this will flip; computers will get very fast and data will be the limiting factor.

3

gamerx88 t1_jdn1dd3 wrote

> In the long run I expect this will flip; computers will get very fast and data will be the limiting factor.

I agree but I think data is already a limiting factor today, with the largest (that is public knowledge) models at 175B. The data used to train these models supposedly already cover a majority of the open internet.

1

PilotThen t1_jdppmpl wrote

There's also the point that they optimise for computer power at training time.

In mass deployment computer power at inference time starts to matter.

1