Submitted by LesleyFair t3_10fw22o in deeplearning
LesleyFair OP t1_j501xt6 wrote
Reply to comment by --dany-- in GPT-4 Will Be 500x Smaller Than People Think - Here Is Why by LesleyFair
First, thanks a lot for reading and thank you for the good questions:
A1) Current GPT-3 is 175B parameters. If GPT-4 would be 100T parameters, it would be a scale-up of roughly 500x.
A2) I got the calculation from the paper for the Turing NLG model. The total training time in seconds is reached by multiplying the number of tokens by the number of model parameters. That number is then divided by the number of GPUs times each GPU's FLOPs per second.
adubowski t1_j549298 wrote
- Is your assumption that GPT-4 will stay the same size as GPT-3?
Viewing a single comment thread. View all comments