Submitted by jackfaker t3_126wg0o in MachineLearning

Has there been much recent literature on leveraging natural language internal interfaces in AI systems, or what are your thoughts in this space? Perhaps not yet implementation, but at least discussion on how it pertains to consciousness and AI alignment?

Examples of potential implementations:

  • LLM with a final gate that chooses to output the next word to the external reader or to an internal stream. LLM output is a function of both the internal stream and the external prompt.
  • Collection of LLMs that engage in a dialogue based on external prompt. External output is a function of a final LLM that parses this dialogue + the external prompt.
  • Loss function includes a critic that evaluates the grammatical structure of a hidden state (after passed through a decoder) at at specific points in the network, encouraging the network to propagate information via natural language.

Link to consciousness:

  1. Simple non-conscious organisms evolve to communicate.
  2. Internal feedback loops sporadically develop in the brains of these organisms.
  3. Feedback loops that encode information via primitive natural language proliferate, as the brain already has substantial evolutionary pressure to interpret and emit information in natural language.
  4. The memory writing function of the brain could plausibly be similar regardless if the natural language was originally sourced from an external stimuli or an internal feedback loop.
  5. Hypothesize that consciousness is feedback loops in the brain through natural language + writing to memory.

Link to AI Safety/Alignment:

Consider a human being who had their stream of consciousness monitored for the entire duration of their life. If an AI system's intelligence is heavily derived from a natural language internal interface, then it becomes possible to observe its thought process in real time and shut it off when this thought process becomes substantially unaligned with humans. Of course this process would be incredibly nuanced in practice, but at surface level this approach appears infinitely more tenable that hoping to diagnose the alignment of an AI system from only an analysis of its inputs and outputs.

4

Comments

You must log in or register to comment.

grotundeek_apocolyps t1_jecdbtm wrote

"AI alignment" / "AI safety" are not credible fields of study.

3

ReasonableObjection t1_jed3b66 wrote

What do you mean?

2

grotundeek_apocolyps t1_jed6u66 wrote

There are real concerns about the impacts of AI on the world, and they all pertain to the ways in which humans choose to use it. That is not the subject matter under consideration in "AI alignment" or "AI safety"; the term that is used for this is usually "AI ethics".

"AI alignment" /"safety" is about trying to prevent AIs from autonomously deciding to harm humans despite having been designed to not do so. This is a made up concern about a type of machine that doesn't exist yet that is predicated entirely on ideas from science fiction.

6

ReasonableObjection t1_jed8z42 wrote

The one part you are wrong is about an "AI agent harming humans despite having been designed not to do so". We don't currently know how to code a model that does not devolve into harming humans even when we are trying to code it not to do so...Keep in mind a lot of the academic and theoretical concerns that have been discussed over the last 30 years and sounded like science fiction have now become very real in the last 6 months (ahead of most serious researcher's timelines and assumed degree of difficulty) and are currently being demonstrated by existing models like chat GPT.

I don't understand how it is a made-up concern when all the ways we have to train these models lead to these negative end-states and we don't currently have a solution to the problem... and none of this is a surprise considering this is exactly what we observe happening in the real-world when it comes to emergent intelligent behavior (artificial or biological).

I also don't think a lot of people understand that these are not "coding" problems... we cannot solve the very basic problems that arise from intelligence even before we can code them into a model.

Even if we could solve these problems (and I'm not one to bet against human Enginuity over a long enough timeline), it is important to understand we can't currently code the solutions into our models. We don't code these models, we train them, and we can only observe their external output and alignment and infer that everything is fine. We have no solution for verification on even this...We have not even got into the bigger problem that we have no idea what is going on inside to make them spit out those outputs and even more importantly no way to probe their inner alignment (a whole other problem)

I agree with you that until we develop an AGI that is superior to a human agent none of that matters, but the danger is we don't understand how these models work any better than we understand how a human brain works and we by definition we won't know when the point of no return has been crossed because of how we design these things...

The danger is running out of runway before somebody accidentally crosses the threshold... If that happens the whole "bad actor uses AI to do bad things" will be the least of our worries. I would argue we already live in that world if you look at things like social media, Facebook newsfeed and etc...

It is really important for people to understand it can get a lot worse... way worse than they can imagine because we humans would not be able to imagine it by definition (we would not be as intelligent).

Edit - the only part where I would argue with you that you are wrong... I absolutely agree with you that less than AGI capabilities can and unfortunately will do huge amounts of damage before an AGI ever becomes a threat... hell if we lucky we will use those to kill ourselves before an AGI does so cause it can be worse if they do it...

2

grotundeek_apocolyps t1_jedbi44 wrote

There is indeed a community of people who think in the way that you have described. They don't know what they're talking about and their research agenda is fundamentally unsound. Nothing that you just wrote is based on an accurate understanding of the science or mathematics of real machine learning.

I'd like to give you a better response than that but I'm honestly not sure how to. What does someone say to someone who is interested in math and who is very enthusiastic about the problem of counting the angels on the head of a pin? I say that not to be insulting but to illustrate the magnitude of the divide between how you perceive the field of machine learning and the reality of it.

3

ReasonableObjection t1_jedg8ki wrote

You could point me to some fundamental research around intelligence in general (forget the math or the computers) because we already demonstrated that it does not matter if intelligence emerges biologically or artificially, or if it was coded in silicone or some disgusting wetware... what matters is the emergent behaviors that result of it.

That's the part I don't think people understand, you can remove the computers, remove the humans, remove the coding and just think about how an intelligent agent would work in any environment and you arrive at the same dangerous conclusions. We don't currently know how to solve for them.

You observe them in any intelligence, after all anybody can argue not everything humans do is beneficial to what mother nature programed us to do (make more babies).

Again, this is not about math or coding... we just haven't solved some basic questions...
For example, can you give me a definition of intelligence where an inferior general intelligent agent (biological or not) would be able to control a superior one over a long enough timeline? Because all of our definitions currently lead us to conclude the answer is no.

Also if you have done any looking into this you would realize even if we could solve these problems we currently lack the capabilities to code them into the models to make sure they are safe.

I'm not trying to overwhelm you with doomer arguments... I'm Genuinely curious and searching for some opposing views that are actually researched and thought out, vs some hand waving about how we will fix it in prod... or hahaha you so dumb cause you think skynet is coming (this tech will be able to kill us long before it is as cool as skynet)... I'm asking for evidence because the people that actually have thought about this for 30+ years still haven't been able to solve these very basic, non-math and non-computer coding related problems.
Any serious researcher that wants to continue despite the danger assumes we will solve the problems before we run out of runway, not that we have solved any of these problems...

I would absolutely bet on human ingenuity to solve these problems given enough time based on the history of human ingenuity... the danger is we will run out of time, not that we can't solve the problem... unfortunately for this particular problem we don't get a do-over like others...

2

grotundeek_apocolyps t1_jefl7kd wrote

The crux of the matter is that there are fundamental limitations to the power of computation. It is physically impossible to create an AI, or any other kind of intelligent agent, that can overpower everything else in the physical world by virtue of sheer smartness.

Depending on where you're coming from this is not an easy thing to understand, it usually requires a lot of education. The simplest metaphor that I've thought of is the speed of light: it seems intuitively plausible that a powerful enough rocket ship should be able to fly faster than the speed of light, but actually the laws of physics prohibit it.

Similarly, it seems intuitively plausible that a smart enough agent should be able to solve any problem arbitrarily quickly, thereby enabling it to (for example) conquer the world or destroy humanity, but that too is physically impossible.

There are a lot of ways to understand why this is true. I'll give you a few places to start.

  • landauer's principle: infinite computation would require infinite resources
  • solomonoff induction is uncomputable: the optimal general method of bayesian induction is literally impossible to compute even in principle
  • chaotic dynamics cannot be predicted: control requires prediction, but the finite precision of measurement and the aforementioned limits on computation mean that our control over the world is fundamentally limited and intelligence can never overcome this fact

The people who have thought about this "for 30+ years" and come to a different conclusion are charlatans. I don't know of a gentler way of putting it. What do you tell someone when they ask you to explain why someone who has been running a cult for 30 years isn't really talking directly to god?

Something to note on the more psychological end of things is that a person's ability to understand things is fundamentally limited by their understanding of their own emotions. The consequence of this is that you should also be thinking about how you're feeling when you're reading hysterical nonsense about the robot apocalypse, because that's going to affect how likely you are to believe things that aren't true. People often fixate on things that have a strong emotional valence, irrespective of their accuracy.

2

ReasonableObjection t1_jefpe2n wrote

Thank you so much for the thoughtful reply!
Will read into these and may reach out to you with other questions.
Edit - as far as how I'm feeling... at the moment just curious, been asking lots of questions about this the last few days and reading any resources people are kind enough to share :-)

2

WikiSummarizerBot t1_jefl95t wrote

Landauer's principle

>Landauer's principle is a physical principle pertaining to the lower theoretical limit of energy consumption of computation. It holds that an irreversible change in information stored in a computer, such as merging two computational paths, dissipates a minimum amount of heat to its surroundings.

Solomonoff's theory of inductive inference

Solomonoff's uncomputability

>Unfortunately, Solomonoff also proved that Solomonoff's induction is uncomputable. In fact, he showed that computability and completeness are mutually exclusive: any complete theory must be uncomputable. The proof of this is derived from a game between the induction and the environment. Essentially, any computable induction can be tricked by a computable environment, by choosing the computable environment that negates the computable induction's prediction.

^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)

1

dansmonrer t1_jedya4g wrote

I don't think intelligence in general is something machine learning people even want to define. Psychologists do, with different schools of thought, including behaviorism (which has heavily influenced reinforcement learning, and of which BF Skinner was one of the main figures) and then cognitivism, theory of mind... The few attempts I have seen at the interesction of psychological science and ML have been heavily backlashed from both sides, for both justified but also unjustified reasons. The truth is some people will probably have to go against the tide at some point but they will also need to ground very well their approach in existing frameworks. Conclusion: try to be excellent in both psycho and ML, the field you are describing has yet to become scientific.

1

ReasonableObjection t1_jedzrt7 wrote

Thank you for your detailed response. So to be clear you are saying that things like emergent goals in independent agents or those agents having convergent instrumental goals are made up or not a problem? Do you have any resources that would describe intelligence or solving the alignment problem in ways that are not dangerous? I’m aware of some research that looks promising but curious if you have others.

1

dansmonrer t1_jeg67bc wrote

Not at all made up in my opinion! There just doesn't seem to be any consensual framework for the moment, and diverse people are scrambling to put relevant concepts together and often disagree on what makes sense. It's particularly hard for ai alignment because it requires you to define what are the dangers you want to speak of, and so to have a model of an open environment in which the agent is supposed to operate which currently we do not have any notion nor example of. This makes examples that people in ai alignment brought up very speculative and poorly grounded which allows for easy critic. I'm curious though if you have interesting research examples in mind!

1