masonw32

masonw32 t1_jdsyi4v wrote

This is only an issue for insanely large numbers though. GPT-4 already performs a ton of multiplications and additions in every layer of every forward pass. You can overfit a much smaller network for multiplication trained on full numbers as tokens, and a GPT-4 like architecture can learn to multiply full numbers for all practical purposes.

It's true that GPT-4 only does a constant number of operations per input though, and asymptotically, the number of operations required to generate the output will scale by O(n log (n)), where n is proportional to the input length. But this is not why it's failing.

1