Viewing a single comment thread. View all comments

SejaGentil OP t1_irstl4q wrote

Thanks for this overview, it makes a lot of sense. Do you have any ideas as to why GPT-3, DALL-E and the like are so bad at generating new insights and logical reasoning? My feeling is that these networks are very good at recalling, like a very dumb human that compensated it with a wikipedia-size memory. For example, if I attempt to prompt something like this on GPT-3:

This is a logical question. Answer it using exact, mathematical reasoning.

There are 3 boxes, A, B, C.
I take the following actions, in order:
- I put a 3 balls on box A.
- I move 1 ball from box A to box C.
- I swap the contents of box A and box B.
How many balls are on each box?

It will fail miserably. Trying to teach it any kind of programming logic is a complete failure, it isn't able to get very basic questions right. Asking step by step doesn't help. For me, the main goal of AGI is to be able to teach a computer how to prove theorems in a proof assistant like Agda, and let it be as apt as myself. But GPT-3 is as unapt as every other AI, and it seems like scaling won't do anything about that. That's why, to me, it feels like AI as a whole is making 0 progress towards (my concept of) AGI, even though it is doing amazing feats in other realms, and that's quite depressing. I use GPT-3 Codex a lot when coding, but only when I need to do some kind of repetitive trivial work, like converting formats. Anything that needs any sort of reasoning is out of its reach. Similarly, DALLE is completely unable to generate new image concepts (like a cowboy riding an ostrich, a cow with a duck beak...).

1

harharveryfunny t1_irtbnwz wrote

GPT-3 isn't an attempt at AI. It's literally just a (very large) language model. The only thing that it is designed to do is "predict next word", and it's doing that in a very dumb way via the mechanism of a transformer - just using attention (tuned via the massive training set) to weight the recently seen words to make that prediction. GPT-3 was really just an exercise in scaling up to see how much better (if at all) a "predict next word" language model could get if the capacity of the model and size of the training set were scaled up.

We would expect GPT-3 to do a good job of predicting next word in a plausible way (e.g. "the cat sat on the" => mat), since that it literally all it was trained to do, but the amazing, and rather unexpected, thing is that it can do so much more ... Feed it "There once was a unicorn", and it'll start writing whole fairy tale about unicorns. Feed Codex "Reverse the list order" and it'll generate code to perform that task, etc. These are all emergent capabilities - not things that it was designed to do, but things that needed to learn to do (and evidentially was capable of learning, via its transformer architecture) in order to get REALLY good at it's "predict next word" goal.

Perhaps the most mind blowing Codex capability was the original release demo video from OpenAI where it had been fed the Microsoft Word API documentation, then was able to USE that information to write code to perform a requested task ("capitalize first letter of each word" if I remember correctly)... So think about it - it was only designed/trained to "predict next word", yet is capable of "reading API documentation" to write code to perform a requested task !!!

Now, this is just a language model, not claiming to be an AI or anything else, but it does show you the power of modern neural networks, and perhaps give some insight into the relationship between intelligence and prediction.

DALL-E isn't claiming to be an AI either, and has a simple flow-through architecture. It basically just learns a text embedding which it maps to an image embedding which is then decoded to the image. To me it's more surprising that something so simple works as well as it does, rather than disappointing that it only works for fairly simple types of compositional requests. It certainly will do its best to render things it was never trained on, but you can't expect it to do very well with things like "two cats wrestling" since it has no knowledge of cat's anatomy, 3-D structure, or how their joints move. What you get is about what you'd expect given what the model consists of. Again, its a pretty simple flow thru text-to-image model, not an AI.

For any model to begin to meet your expectations of something "intelligent" it's going to have to be designed with that goal in the first place, and that it still in the future. So, GPT-3 is perhaps a taste of what is to come... if a dumb language model is capable of writing code(!!!), then imagine what a model that is actually designed to be intelligent should be capable of ...

2

SejaGentil OP t1_iruye6s wrote

Just answering to thank you for all the info, I don't have any more question for now.

2