Submitted by Vegetable-Skill-9700 t3_121a8p4 in MachineLearning
alrunan t1_jdmm3lw wrote
Reply to comment by harharveryfunny in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
The 7B model is trained on 1T tokens and performs really well for its number of parameters.
[deleted] t1_jdnmu4i wrote
[deleted]
Viewing a single comment thread. View all comments