Viewing a single comment thread. View all comments

harharveryfunny t1_jdmd38s wrote

>You should read the LLaMA paper.

OK - will do. What specifically did you find interesting (related to scaling or not) ?

1

alrunan t1_jdmm3lw wrote

The 7B model is trained on 1T tokens and performs really well for its number of parameters.

3