Viewing a single comment thread. View all comments

LesleyFair OP t1_j501xt6 wrote

First, thanks a lot for reading and thank you for the good questions:

A1) Current GPT-3 is 175B parameters. If GPT-4 would be 100T parameters, it would be a scale-up of roughly 500x.

A2) I got the calculation from the paper for the Turing NLG model. The total training time in seconds is reached by multiplying the number of tokens by the number of model parameters. That number is then divided by the number of GPUs times each GPU's FLOPs per second.

8

adubowski t1_j549298 wrote

  1. Is your assumption that GPT-4 will stay the same size as GPT-3?
2