Comments

You must log in or register to comment.

sdmat t1_je42icd wrote

> So long as it’s a transformer model, GPT-4 will also be a query engine on a giant corpus of text, just with more of the holes patched up, so it’d be harder to see the demonstrative examples of it being that.

This claim has a strong scent of sophistry about it - any and all signs of intelligence can be handwaved away as interpolating to plausible text.

The explanations of failures are convincing, but the theory needs to go further and explain why larger models like GPT4 (and in some cases 3.5) are so much more effective at answering out-of-domain queries with explicit reasoning proceding from information that it does have. E.g 4 correctly answers the weights question and gives a clear explanation of its reasoning. And that isn't an isolated example.

It's not just an incremental improvement, there is a clear difference in kind.

13

fripperML OP t1_je43ftw wrote

Yes, I don't know what to think honestly. I've read with amusement this paper (well, some of the examples, not all because I did not have time to finish it):

https://arxiv.org/abs/2303.12712

It's very optimistic, and alligned with what you say (not an incremental improvement from previous models).

But then, besides the article I shared, I've read this thread:

https://www.reddit.com/r/MachineLearning/comments/124eyso/n_openai_may_have_benchmarked_gpt4s_coding/

So I don't know... Probably we will see soon, when access to GPT-4 is more spread.

Thanks for commenting :)

3

Haycart t1_je4923c wrote

>Yes, ChatGPT is doing much more than querying text! It is not just a query engine on a giant corpus of text. … Duh! I do not think you should only think of ChatGPT as a query engine on a giant corpus of text. There can be a lot of value in reasoning about ChatGPT anthropomorphically or in other ways. RLHF also complicates the story, as over time it weighs responses away from the initial training data. But “query engine on a giant corpus of text” should be a non-zero part of your mental model because, without it, you cannot explain many of the things ChatGPT does.

The author seems to present this bizarre dichotomy, that either you have to think of ChatGPT as a query engine or you have to think of it in magical/mystical/anthropomorphic terms.

(They also touch on viewing ChatGPT as a function on the space of "billion dimensional" embeddings. This is closer to the mark but seems to conflate the model's parameter count with the dimensionality of its latent space, which doesn't exactly inspire confidence in the author's level of understanding.)

Why not just think of ChatGPT as what it is--a very large transformer?

The fact that a model like ChatGPT is able to do what it does is not at all surprising, IMO, when you consider the following facts:

  1. Transformers (and neural networks in general) are universal approximators. A sufficiently large neural network can approximate any function to arbitrary precision (with a few minor caveats).
  2. Neural networks trained with stochastic gradient descent benefit from implicit regularization -- SGD naturally tends to seek out simple solutions that generalize well. Furthermore, larger neural networks appear to generalize better than smaller ones.
  3. The recent GPTs have been trained on a non-trivial fraction of the entire internet's text content.
  4. Text on the internet (and language data in general) arises from human beings interacting with the world--reasoning, thinking, and emoting about those interactions--and attempting to communicate the outcome of this process to one another.

Is it really crazy to imagine that the simplest possible function capable of fitting a dataset as vast as ChatGPT's, might resemble the function that produced it? A function that subsumes, among other things, human creativity and reasoning?

In another world, GPT 3 or 4 might have turned out to be incapable of approximating that function to any notable degree of fidelity. But even then, it wouldn't be outlandish to imagine that one of the later members of the GPT family could eventually succeed.

5

sdmat t1_je4bgwh wrote

Exactly, it's bizarre to point to revealing failure cases for a universal approximator then claim that fixing those failure cases in later versions would be irrelevant.

Entirely possible that GPT3 only did interpolation and fails horribly out of domain and that GPT5 will infer the laws of nature, language, psychology, logic, etc and be able to apply them to novel material.

It certainly looks like GPT4 is somewhere in between.

4

ChuckSeven t1_je55o02 wrote

The Transformer is not a universal function approximator. This is simply shown by the fact that it cannot process arbitrary long input due to the finite context limitations.

Your conclusion is not at all obvious or likely given your facts. They seem to be in hindsight given the strong performance of large models.

It's hard to think of chatgpt as a very large transformer ... because we don't know how to think about very large transformers.

1

Haycart t1_je6grih wrote

>The Transformer is not a universal function approximator. This is simply shown by the fact that it cannot process arbitrary long input due to the finite context limitations.

We can be more specific, then: the transformer is a universal function approximator* on the space of sequences that fit within its context. I don't this distinction is necessarily relevant to the point I'm making, though.

*again with caveats regarding continuity etc.

>Your conclusion is not at all obvious or likely given your facts. They seem to be in hindsight given the strong performance of large models.

Guilty as charged, regarding hindsight. I won't claim to have predicted GPT-3's performance a-priori. That said, my point was never that the strong performance we've observed from recent LLMs was obvious or likely--only that it shouldn't be surprising. And, in particular it should not be surprising that a GPT model (not necessarily GPT-3 or 4) trained on a language modeling task would have the abilities we've seen. Everything we've seen falls well within the bounds of what transformers are theoretically capable of doing.

There are, of course, aspects of the current situation specifically that you can be surprised about. Maybe you're surprised that 100 billion-ish parameters is enough, or that the current volume of training data was sufficient. My argument is mostly aimed at claims along the lines of "GPT-n can't do X because transformers lack capability Y" or "GPT-n can't do X because it is only trained to model language".

1

NoLifeGamer2 t1_je4bxa1 wrote

I love how there are so many GPT models now that we have taken to calling them GPT-n lol

0