Viewing a single comment thread. View all comments

alrunan t1_jdmbv4k wrote

The chinchilla scaling laws is just used to calculate the optimal scale for dataset and model size for a particular training budget.

You should read the LLaMA paper.

3

harharveryfunny t1_jdmd38s wrote

>You should read the LLaMA paper.

OK - will do. What specifically did you find interesting (related to scaling or not) ?

1

alrunan t1_jdmm3lw wrote

The 7B model is trained on 1T tokens and performs really well for its number of parameters.

3