ReasonableObjection t1_jed8z42 wrote on March 31, 2023 at 4:08 AM

Reply to comment by grotundeek_apocolyps in [D] AI Explainability and Alignment through Natural Language Internal Interfaces by jackfaker

The one part you are wrong is about an "AI agent harming humans despite having been designed not to do so". We don't currently know how to code a model that does not devolve into harming humans even when we are trying to code it not to do so...Keep in mind a lot of the academic and theoretical concerns that have been discussed over the last 30 years and sounded like science fiction have now become very real in the last 6 months (ahead of most serious researcher's timelines and assumed degree of difficulty) and are currently being demonstrated by existing models like chat GPT.

I don't understand how it is a made-up concern when all the ways we have to train these models lead to these negative end-states and we don't currently have a solution to the problem... and none of this is a surprise considering this is exactly what we observe happening in the real-world when it comes to emergent intelligent behavior (artificial or biological).

I also don't think a lot of people understand that these are not "coding" problems... we cannot solve the very basic problems that arise from intelligence even before we can code them into a model.

Even if we could solve these problems (and I'm not one to bet against human Enginuity over a long enough timeline), it is important to understand we can't currently code the solutions into our models. We don't code these models, we train them, and we can only observe their external output and alignment and infer that everything is fine. We have no solution for verification on even this...We have not even got into the bigger problem that we have no idea what is going on inside to make them spit out those outputs and even more importantly no way to probe their inner alignment (a whole other problem)

I agree with you that until we develop an AGI that is superior to a human agent none of that matters, but the danger is we don't understand how these models work any better than we understand how a human brain works and we by definition we won't know when the point of no return has been crossed because of how we design these things...

The danger is running out of runway before somebody accidentally crosses the threshold... If that happens the whole "bad actor uses AI to do bad things" will be the least of our worries. I would argue we already live in that world if you look at things like social media, Facebook newsfeed and etc...

It is really important for people to understand it can get a lot worse... way worse than they can imagine because we humans would not be able to imagine it by definition (we would not be as intelligent).

Edit - the only part where I would argue with you that you are wrong... I absolutely agree with you that less than AGI capabilities can and unfortunately will do huge amounts of damage before an AGI ever becomes a threat... hell if we lucky we will use those to kill ourselves before an AGI does so cause it can be worse if they do it...

grotundeek_apocolyps t1_jedbi44 wrote on March 31, 2023 at 4:34 AM

There is indeed a community of people who think in the way that you have described. They don't know what they're talking about and their research agenda is fundamentally unsound. Nothing that you just wrote is based on an accurate understanding of the science or mathematics of real machine learning.

I'd like to give you a better response than that but I'm honestly not sure how to. What does someone say to someone who is interested in math and who is very enthusiastic about the problem of counting the angels on the head of a pin? I say that not to be insulting but to illustrate the magnitude of the divide between how you perceive the field of machine learning and the reality of it.

ReasonableObjection t1_jedg8ki wrote on March 31, 2023 at 5:25 AM

You could point me to some fundamental research around intelligence in general (forget the math or the computers) because we already demonstrated that it does not matter if intelligence emerges biologically or artificially, or if it was coded in silicone or some disgusting wetware... what matters is the emergent behaviors that result of it.

That's the part I don't think people understand, you can remove the computers, remove the humans, remove the coding and just think about how an intelligent agent would work in any environment and you arrive at the same dangerous conclusions. We don't currently know how to solve for them.

You observe them in any intelligence, after all anybody can argue not everything humans do is beneficial to what mother nature programed us to do (make more babies).

Again, this is not about math or coding... we just haven't solved some basic questions...
For example, can you give me a definition of intelligence where an inferior general intelligent agent (biological or not) would be able to control a superior one over a long enough timeline? Because all of our definitions currently lead us to conclude the answer is no.

Also if you have done any looking into this you would realize even if we could solve these problems we currently lack the capabilities to code them into the models to make sure they are safe.

I'm not trying to overwhelm you with doomer arguments... I'm Genuinely curious and searching for some opposing views that are actually researched and thought out, vs some hand waving about how we will fix it in prod... or hahaha you so dumb cause you think skynet is coming (this tech will be able to kill us long before it is as cool as skynet)... I'm asking for evidence because the people that actually have thought about this for 30+ years still haven't been able to solve these very basic, non-math and non-computer coding related problems.
Any serious researcher that wants to continue despite the danger assumes we will solve the problems before we run out of runway, not that we have solved any of these problems...

I would absolutely bet on human ingenuity to solve these problems given enough time based on the history of human ingenuity... the danger is we will run out of time, not that we can't solve the problem... unfortunately for this particular problem we don't get a do-over like others...

grotundeek_apocolyps t1_jefl7kd wrote on March 31, 2023 at 5:28 PM

The crux of the matter is that there are fundamental limitations to the power of computation. It is physically impossible to create an AI, or any other kind of intelligent agent, that can overpower everything else in the physical world by virtue of sheer smartness.

Depending on where you're coming from this is not an easy thing to understand, it usually requires a lot of education. The simplest metaphor that I've thought of is the speed of light: it seems intuitively plausible that a powerful enough rocket ship should be able to fly faster than the speed of light, but actually the laws of physics prohibit it.

Similarly, it seems intuitively plausible that a smart enough agent should be able to solve any problem arbitrarily quickly, thereby enabling it to (for example) conquer the world or destroy humanity, but that too is physically impossible.

There are a lot of ways to understand why this is true. I'll give you a few places to start.

landauer's principle: infinite computation would require infinite resources
solomonoff induction is uncomputable: the optimal general method of bayesian induction is literally impossible to compute even in principle
chaotic dynamics cannot be predicted: control requires prediction, but the finite precision of measurement and the aforementioned limits on computation mean that our control over the world is fundamentally limited and intelligence can never overcome this fact

The people who have thought about this "for 30+ years" and come to a different conclusion are charlatans. I don't know of a gentler way of putting it. What do you tell someone when they ask you to explain why someone who has been running a cult for 30 years isn't really talking directly to god?

Something to note on the more psychological end of things is that a person's ability to understand things is fundamentally limited by their understanding of their own emotions. The consequence of this is that you should also be thinking about how you're feeling when you're reading hysterical nonsense about the robot apocalypse, because that's going to affect how likely you are to believe things that aren't true. People often fixate on things that have a strong emotional valence, irrespective of their accuracy.

ReasonableObjection t1_jefpe2n wrote on March 31, 2023 at 5:55 PM

Thank you so much for the thoughtful reply!
Will read into these and may reach out to you with other questions.
Edit - as far as how I'm feeling... at the moment just curious, been asking lots of questions about this the last few days and reading any resources people are kind enough to share :-)

WikiSummarizerBot t1_jefl95t wrote on March 31, 2023 at 5:28 PM

Landauer's principle

>Landauer's principle is a physical principle pertaining to the lower theoretical limit of energy consumption of computation. It holds that an irreversible change in information stored in a computer, such as merging two computational paths, dissipates a minimum amount of heat to its surroundings.

Solomonoff's theory of inductive inference

Solomonoff's uncomputability

>Unfortunately, Solomonoff also proved that Solomonoff's induction is uncomputable. In fact, he showed that computability and completeness are mutually exclusive: any complete theory must be uncomputable. The proof of this is derived from a game between the induction and the environment. Essentially, any computable induction can be tricked by a computable environment, by choosing the computable environment that negates the computable induction's prediction.

^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)

dansmonrer t1_jedya4g wrote on March 31, 2023 at 9:33 AM

I don't think intelligence in general is something machine learning people even want to define. Psychologists do, with different schools of thought, including behaviorism (which has heavily influenced reinforcement learning, and of which BF Skinner was one of the main figures) and then cognitivism, theory of mind... The few attempts I have seen at the interesction of psychological science and ML have been heavily backlashed from both sides, for both justified but also unjustified reasons. The truth is some people will probably have to go against the tide at some point but they will also need to ground very well their approach in existing frameworks. Conclusion: try to be excellent in both psycho and ML, the field you are describing has yet to become scientific.

ReasonableObjection t1_jedzrt7 wrote on March 31, 2023 at 9:54 AM

Thank you for your detailed response. So to be clear you are saying that things like emergent goals in independent agents or those agents having convergent instrumental goals are made up or not a problem? Do you have any resources that would describe intelligence or solving the alignment problem in ways that are not dangerous? I’m aware of some research that looks promising but curious if you have others.

dansmonrer t1_jeg67bc wrote on March 31, 2023 at 7:47 PM

Not at all made up in my opinion! There just doesn't seem to be any consensual framework for the moment, and diverse people are scrambling to put relevant concepts together and often disagree on what makes sense. It's particularly hard for ai alignment because it requires you to define what are the dangers you want to speak of, and so to have a model of an open environment in which the agent is supposed to operate which currently we do not have any notion nor example of. This makes examples that people in ai alignment brought up very speculative and poorly grounded which allows for easy critic. I'm curious though if you have interesting research examples in mind!