Viewing a single comment thread. View all comments

Anjz t1_jc758w9 wrote

Blows my mind, they used a large language model to train a small one.

>Fine-tuning a 7B LLaMA model took 3 hours on 8 80GB A100s, which costs less than $100 on most cloud compute providers.

Now imagine what's possible with GPT-4 training a smaller language model and a bigger instruction sample with corporate backing to use hundreds of A100's at the same time for days at a time?

We're already in reach of exponential growth for low powered devices, it's not going to take years like people have predicted.