Viewing a single comment thread. View all comments

Kafke t1_iwp95fv wrote

All it takes is an understanding of how AI currently works to realize that the current approach won't ever reach AGI. There are inherent limitations to the design, and so that design needs to be reworked before certain things can be achieved.

1

ECEngineeringBE t1_iwpov33 wrote

Current approach as in autoregressive next token text prediction? Any next token text prediction in general, even multimodal? Or current approach as in entire field of deep learning?

Could you please first specify what you mean by "current approach" and "rework" exactly? In my mind, it doesn't particularly matter if some approach needs a rework if that rework is easily implementable. So I think that you should first kind of expand on the point you're making so that we can discuss it.

1

Kafke t1_iwppsn1 wrote

Ah sorry. I'm referring to the entire field of deep learning. Every model I've witnessed so far has just been static input->output machines with the output adjusted per weights that are trained. This approach, while good for mapping inputs and outputs, is notoriously bad at a variety of cognitive tasks that require something other than a single static link. For example, having an AI that learns over time is impossible. Likewise any sort of memory task (instead, it must be "hacked" or cheated by simply providing the "memories" as yet another input). Likewise there's no way for the AI to actually "think" or perform other cognitive tasks.

This is why current approaches require massive datasets and models, because they're just trying to map every single possible input to a related output. Which.... simply doesn't work for a variety of cognitive tasks.

No amount of cramming data or expanding the models will ever result in an AI that can learn new tasks given some simple instructions and then immediately perform them competently like a human would. Likewise, no amount of cramming data or expanding models will ever result in an AI that can actually coherently understand, recognize, and respond to you.

LLMs no matter their size suffer from the exact same problem and it's clear as soon as you "ask" it something that's outside of the dataset. The AI has no way of recognizing that it is wrong, because all it's doing is providing the closest output to your input, not actually understanding what you're saying or prompting.

This approach is pretty good at extension tools like what we see with current LLMs, along with things like text2image, captioning, etc. which is obviously where we see AI shining best. But ask it literally anything that can't be a mapped I/O, and you'll see it's no better than AI 20-30 years ago.

1

ECEngineeringBE t1_iwq1ju9 wrote

At first, I was going to write a comment that went through and addressed every single one of your points. A couple of them are factually wrong, some are confused, but a lot of the other ones boil down to pointing out how current systems are bad at X, therefore deep learning is never going to be able to do X.

This is why I decided to take a bit more general approach and not stray too far away from the original purpose of my comment. It is not my purpose to convince you that deep learning will achieve AGI, but rather, that you can't claim with certainty that it won't.

We have already seen that larger models end up with certain emergent capabilities not present in smaller models, so finding faults in current ones is not sufficient for dismissing the method entirely. Especially because our largest models are still way too tiny in comparison to the human brain - a brain has ~150T synapses (I know that parameters aren't the same as biological synapses, but I'm pointing out the order of magnitude).

Additionally, matrix multiplications with nonlinear activations are Turing complete. This means that there exists a set of weights that would create an AGI. The question then becomes, not whether you could build an AGI with NNs, but rather, whether backprop, as a program search algorithm, is capable of finding that set of weights. And claiming that you know for certain is the same as claiming that you intuitively understand how a 100T dimensional search space looks, and what backprop with regularization is actually doing. Considering the amount of papers that keep coming out and pointing out some unexpected behaviors of backprop, it is safe to say that nobody fully understands what it's actually doing.

My point, more generally, can be summarized like this:

In any field, if there is a certain percentage of experts (say 10% or more) that hold an opinion X, and you can't either formally, or empirically prove that X is not true, then you can't claim with complete certainty that X is not true.

Now, some of the confused or factually incorrect statements from your comment:

>For example, having an AI that learns over time is impossible.

Not true, there are various approaches to doing continual learning, such as this one:

https://arxiv.org/abs/2108.06325

>Every model I've witnessed so far has just been static input->output machines

Every system can be expressed as an input->output system - that's what Turing machines are for.

>No amount of cramming data or expanding the models will ever result in an AI that can learn new tasks given some simple instructions and then immediately perform them competently like a human would

I've actually done this. You can do this via prompt engineering. For example, I created a prompt where I add two 8 digit numbers together (written in a particular way) in a stepwise digit by digit fashion, and explain my every step to the model in plain language. I then ask it to add different two numbers together, and it begins generating the same explanation of digit by digit addition, and finally arriving at the correct answer.

>LLMs no matter their size suffer from the exact same problem and it's clear as soon as you "ask" it something that's outside of the dataset

You do realize that test sets don't contain data from within the dataset, and that the accuracy on them is not zero?

1

Kafke t1_iwq3sbf wrote

You wrote a lot but ultimately didn't resolve the problem I put forward. Let me just ask: has such an AI ever prompted you? Has it ever asked you a question?

The answer, of course, is no. Such a thing is simply impossible. It cannot do such a thing due to the architecture of the design, and it will never be able to do such a thing, until that design is changed.

> I've actually done this.

You've misunderstood what I meant. If I ask it to go find a particular youtube video meeting XYZ criteria, could it do it? How about if I hook it up to some new input sensor and then ask it to figure out how the incoming data is formatted and explain it in plain english? Of course, the answer is no. It'll never be able to do these things.

As I said, you're looking at strict "I provide X input and get Y output". Static. Deterministic. Unchanging. Such a thing can never be an agent, and thus can never be a true AGI. Unless, of course, you loosen the term "AGI" to just refer to a regular AI that can do a variety of tasks.

Cramming more text data into a model won't resolve these issues. Because they aren't problems having to do with knowledge, but rather ability.

> For example, I created a prompt where I add two 8 digit numbers together (written in a particular way) in a stepwise digit by digit fashion, and explain my every step to the model in plain language. I then ask it to add different two numbers together, and it begins generating the same explanation of digit by digit addition, and finally arriving at the correct answer.

Cool. Now tell it to do it without giving it the instructions, and wait for it to ask for clarification on how to do the task. This will never happen. Instead it'll just spit out whatever the closest output is to your prompt. It can't stop to ask for clarification, because of how such a system is designed. And no amount of increasing the size of the model will ever fix that.

1

ECEngineeringBE t1_iwq8f8j wrote

>Static. Deterministic. Unchanging. Such a thing can never be an agent, and thus can never be a true AGI

It can deterministically output probability distributions, which you can then sample, making it nondeterministic. You also say that such a system can never be an agent. A chess engine is an agent. Anything that has a goal and acts in an environment to achieve it is an agent, whether deterministic or not.

But even a fully deterministic program can be an AGI. If you deny this, then this turns into a philosophical debate on determinism, which I'd rather avoid.

As for "static" and "unchanging" points - you can address those by continual learning, although that's not the only way you can do it.

There are some other points you make, but those are again simply doing the whole "current models are bad at X, therefore current methods can't achieve X".

I also think that you might be pattern matching a lot to GPT specifically. There are other interesting DL approaches that look nothing like the next token prediction.

Now, I think we ought to narrow down our disagreements here, as to avoid pointless arguments. So let me ask a concrete question:

Do you believe that a computer program - a code being run on a computer, can be generally intelligent?

1

Kafke t1_iws9po9 wrote

Again, you completely miss what I'm saying. I'll admit that the current approach to ML/DL could result in AGI when, on it's own volition and unprompted, the AI asks the user a question, without that question being preprogrammed in. IE the AI doing something on it's own, and not simply responding to a prompt.

> A chess engine is an agent

Ironically, a chess program has a better chance of becoming an AGI than the current approach used for AI.

> As for "static" and "unchanging" points - you can address those by continual learning, although that's not the only way you can do it.

Continual learning won't solve that. At best, you'll have a model that updates with use. That's still static.

> There are some other points you make, but those are again simply doing the whole "current models are bad at X, therefore current methods can't achieve X".

It's not that they're "bad at X" it's that their architecture is fundamentally incompatible with X.

> There are other interesting DL approaches that look nothing like the next token prediction.

Care to share one that isn't just a matter of a static machine accepting input and providing an output? I try to watch the field of AI pretty closely and I can't say I've ever seen such a thing.

> Do you believe that a computer program - a code being run on a computer, can be generally intelligent?

Sure. In theory I think it's definitely possible. I just don't think that the current approach will ever get there. Though I would like to note that "general intelligence" and an AGI are kinda different, despite the similar names. Current AI is "narrow" in that it works on one specific field or domain. The current approach is to take this I/O narrow AI and broaden the domains it can function in. This will achieve a more "general" ability and thus "general intelligence", however it will not ever achieve an AGI, as an AGI has features other than "narrow AI but more fields". For example, such I/O machines will never be able to truly think, they'll never be able to plan, act out, and initiate their goals, they'll never be able to interact with the world in a way that is unlike current machines.

As it stands, my computer, or any computer, does nothing until I explicitly tell it to. Until an AI can overcome this fundamental problem, it will never be an AGI, simply due to architectural design.

Such an AI will never be able to properly answer "what have you been up to lately?". Such an AI will never be able to browse through movies, watch one on it's own volition, and then prompt a user about what it has just done. Such an AI will never be able to have you plug in a completely new hardware device into your user, and be able to figure out what it does, and be able to interact with it.

The current approach will never be able to accomplish such tasks, because of how the architecture is designed. They are reactive, and not active. A true AGI will need to be active, and be able to set out and accomplish tasks without being prompted. It'll need to be able to actually think, and not just respond to particular inputs with particular outputs.

1