Viewing a single comment thread. View all comments

appenz t1_jbqsu7k wrote on March 11, 2023 at 12:07 AM

Both of the answers above are correct and if you care about the structure (i.e. depth, layers etc.) of the transformer it is complicated.

If you only care about scaling with the number of weights, most transformers scale with O(weights) and a generative transformer like GPT scales approximately with 2*weights.