Viewing a single comment thread. View all comments

ChuckSeven t1_je55o02 wrote

The Transformer is not a universal function approximator. This is simply shown by the fact that it cannot process arbitrary long input due to the finite context limitations.

Your conclusion is not at all obvious or likely given your facts. They seem to be in hindsight given the strong performance of large models.

It's hard to think of chatgpt as a very large transformer ... because we don't know how to think about very large transformers.

1

Haycart t1_je6grih wrote

>The Transformer is not a universal function approximator. This is simply shown by the fact that it cannot process arbitrary long input due to the finite context limitations.

We can be more specific, then: the transformer is a universal function approximator* on the space of sequences that fit within its context. I don't this distinction is necessarily relevant to the point I'm making, though.

*again with caveats regarding continuity etc.

>Your conclusion is not at all obvious or likely given your facts. They seem to be in hindsight given the strong performance of large models.

Guilty as charged, regarding hindsight. I won't claim to have predicted GPT-3's performance a-priori. That said, my point was never that the strong performance we've observed from recent LLMs was obvious or likely--only that it shouldn't be surprising. And, in particular it should not be surprising that a GPT model (not necessarily GPT-3 or 4) trained on a language modeling task would have the abilities we've seen. Everything we've seen falls well within the bounds of what transformers are theoretically capable of doing.

There are, of course, aspects of the current situation specifically that you can be surprised about. Maybe you're surprised that 100 billion-ish parameters is enough, or that the current volume of training data was sufficient. My argument is mostly aimed at claims along the lines of "GPT-n can't do X because transformers lack capability Y" or "GPT-n can't do X because it is only trained to model language".

1