Viewing a single comment thread. View all comments

gamerx88 t1_j2vzjfx wrote

"An empirical analysis of compute-optimal large language model training" by Deepmind, suggesting that LLMs are over-parameterized or under-trained (insufficient data used in training).

2