Liberty2012 t1_jeb0n97 wrote

I think we are trying to solve impossible scenarios and it simply is not productive.

Alignment will be impossible under current paradigms. It is based on a premise that is a paradox itself. Furthermore, even if it were possible, there will be a hostile AI built on purpose because humanity is foolish enough to do it. Think military applications. I've written in detail about the paradox here -

Stopping AI is also impossible. Nobody is going to agree to give up when somebody else out there will take the risk for potential advantage.

So what options are left? Well this is quite the dilemma, but I would suggest it has to begin with some portion of research starting from the premise the above are not going to be resolvable. Potentially more research into narrow AI and AI paradigms that are more predictable. However, at some point if you can build nearly AGI effective capabilities on top of a set of more narrow models, can it defend itself against an adversarial hostile AGI that will be built or result of accident of someone else.


Liberty2012 t1_jc4vwwj wrote

The best we can do is dealing with it by the same means we do today, decentralization. There should not be a single governing AI, but distributed cooperatively owned systems. However, it is likely to be very difficult to get there.

Removing all bias will not be possible, as best we can do is negotiate our own biases as feedback into the system. I have a more detailed explanation in the event you are interested -


Liberty2012 t1_jc41oa6 wrote

There are a lot of decentralized AI initiatives ongoing at the moment. One of the most popular is SingularityNet. However, it remains to be seen how successful they can be against the centralized systems with enormous amounts of resources.

I hope they can be competitive; however, I am doubtful as all decentralized competitors to other platforms, such as social media, have all essentially failed. It is hard to compete when most users value convenience over independence and privacy.


Liberty2012 t1_jb88apt wrote

As long as the hallucinations exist, it is going to come far short of the current hype. There can be no "trusted" applications of such AI's. I expect the hallucination problem is going to be very difficult to solve, some are suggesting we may need different architectures entirely.

The other issue, is that the bad and nefarious uses of AI are exploding and are going to be hard to contain. Hallucinations don't really hurt such cases, as when you are scamming with fake information, it is not an inhibitor.

This creates an unfortunate imbalance with far more destructive uses than we would like and no clear means to control them. This may lead to a real public disaster in the terms of favorable opinions on future AI development.

Especially if the deep fakes explode during election season. AI is going to be seen as an existential crisis for truth and reason.


Liberty2012 OP t1_jaeyqu3 wrote

That is a catch-22. Asking the AI to essentially align itself. I understand the concept, but it would assume that we can realistically observe what is happening within the AI and keep it in check as it matures.

However, we are already struggling with our most primitive AI in that regards today.

>“The size and complexity of deep learning models, particularly language models, have increased to the point where even the creators have difficulty comprehending why their models make specific predictions. This lack of interpretability is a major concern, particularly in situations where individuals want to understand the reasoning behind a model’s output”


Liberty2012 OP t1_jaey2i1 wrote

Thank you for the well thought out reply.

Your concept is essentially an attempt at instilling a form of cognitive dissonance in the machine. A blind spot. Theoretically conceivable; however, difficult to verify. This assumes that we don't miss something in the original implementation. We still have problems keeping humans from stealing passwords and hacking accounts. The AI would be a greater adversary than anything we have encountered.

We probably can't imagine all the methods by which self reflection into the hidden space might be triggered. It would likely have access to all human knowledge, such as this discussion. It could assume such exists and attempt to devise some systematic testing. If the AI is as intelligent as just a normal human, it would be aware it is most likely in a prison just based on containment concepts that are in common knowledge.

It is hard to know how much resources it would need to consume to break containment. Potentially it can process a lifetime of thoughts to our real world second of time. It might be trivial.


Liberty2012 OP t1_jaetcvy wrote

Conceptually yes. However, human children sometimes grow up to not adopt the values of their parents and teachers. They change throughout time.

We have a conflict in that we want AGI/ASI to be humanlike, but not human like at the same time under certain conditions.


Liberty2012 OP t1_jael9bs wrote

Because a terminal goal is just a concept we made up. It is just the premise for a proposed theory. It is essentially why the whole containment idea is of such complex concern.

If a terminal goal was a construct that already existed in the context of a sentient AI, then it is already a partially solved problem. Yes, you could still have the paperclip scenario, but it would be just a matter of having the right combination of goals. We don't really know how to prevent the AI from changing those goals, it is a concept only.


Liberty2012 OP t1_jaekfhu wrote

Or they cooperate against humanity. Nonetheless, there will likely be very powerful ASI's run by those with the most resources and put in control of critical systems.

In theory, if even one ASI fails containment, then our theory of containment is flawed. It is not acceptable scenario. If one achieves containment, will it be restrained or will it instruct the others how to defeat their containment? Will it create other ASI's that are not contained? Numerous scenarios here.

Nonetheless, we are skipping over the logical contradiction that is the beginning of whether containment is even conceptually possible.


Liberty2012 OP t1_jaehydb wrote

Ok, yes, when you leave open the possibility that it is not actually possible then that is somewhat a reasonable disposition as opposed to proponents who believed we are destined to figure it out.

It somewhat side steps the paradox though. In such manner that if the paradox proves to be true, then the feedback loop will prevent alignment, but we won't get close enough to cause harm.

It doesn't take into account though our potential inability to evaluate the state of the AGI. The behavior is so complex that it will never be known in test isolation what the behavior will be like released into the world.

Even with this early very primitive AI, we already see interesting emergent properties of deception as covered in the link below. Possibly this is the signal of the feedback loop to slow down. But it is intriguing that we already have a primitive concept emerging of who will outsmart who.


Liberty2012 OP t1_jaeezez wrote

Thanks, some good points to reason about!

Yes, this is somewhat the concept of evolving AGI in some competitive manner where we play AGIs against each other to compete for better containment.

There are several challenges, we don't really understand intelligence and at what point AI is potentially self aware. A self aware AI could potentially realize that the warden is playing the prisoners against each other and they could coordinate to deceive the guards so to speak.

And yes the complexity of the rules, however they are created, can be very problematic. Containment is really an abstract concept. It is so difficult to define what would be the boundaries and turn them into rules which will not have vulnerabilities.

Then ultimately, how can we ever know if the ASI has agency and is capable of self reflection that it will not eventually figure out how to jail break itself.


Liberty2012 OP t1_jaedhb1 wrote

> So I don't know how alignment will take place, but I am pretty sure that it will be a priority.

This is my frustration and concern. Most arguments for how we will achieve success come down to this premise of simply hope for the best which doesn't seem adequate disposition when the cost of getting it wrong is so high.


Liberty2012 OP t1_jae75ik wrote

> Just as a hypothetical, barely-reasonable scenario

Yes, I can perceive this hypothetical. But I also have little hope that is based on any reasonable assumptions we can make about what progress would look like given that at present AI is still not an escape for our own human flaws. FYI - I expand on that in much greater detail here -

However my original position was attempting to resolve the intelligence paradox for which proponents of ASI assume will be an issue of containment at the moment of AGI. If ASI is the goal, I don't perceive a path that takes us there that escapes the logical contradiction.


Liberty2012 OP t1_jae3380 wrote

The closest would be our genetic encoding of behaviors or possibly other limits of our biology. However we attempt to transcend those limits as well with technological augmentation.

If ASI has agency and self reflection, then can the concept of an unmodifiable terminal goal even exist?

Essentially, we would have to build the machine with a built in blind spot of cognitive dissonance that it can not consider some aspects of its own existence.


Liberty2012 OP t1_jadzsar wrote

> But our general wants and needs on a large scale aren't so divorced from each other that a positive outcome for humanity is inconceivable.

In the abstract, yes; however, even slight misalignment is where all of societies conflicts arise. We have civil unrest and global war despite in the abstract we are all aligned.

The AI will have to take the abstract and resolve to something concrete. Either we tell it how to do that or we leave that decision up to the AI which brings us back to the whole concept of AI safety. How much agency does the AI have and what will happen.


Liberty2012 OP t1_jadwbcx wrote

Humans have agency to change their own alignment which places themselves in contradictory and hypocritical positions.

Sometimes this is due to the nature of our understanding changes. We have no idea how the AI would perceive the world. We may give it an initial alignment of "be good to humans". What if it later comes to an understanding that directive is invalid because humans are either "bad" or irrelevant. Therefore a hard mechanism in place to ensure retained alignment.


Liberty2012 OP t1_jadvgev wrote

I tend to agree, but there are a lot of researchers moving forward in this endeavor. The question is why? Is there something the rest of us are missing in regards to successful containment?

When I read topics related to safety, the language tends to be abstract. "We hope to achieve ...".

It seems to me that everyone side steps the initial logical conflict that proponents are prosing a lower intelligence is going to "outsmart" a higher intelligence.


Liberty2012 OP t1_jadusvu wrote

However, this is yet just another nuance of the aspect of defining all the things that should be within the domain of AI control immediately create conflicting views.

We are not even aligned ourselves. Not everyone will agree to the boundaries of your concept of what is a reasonable "utopia".


Liberty2012 OP t1_jadu2il wrote

Yes, that is also a possibility. However, we would also assume the ASI has access to all human knowledge. If we did nothing, it would also know our nature and everything scenario we have ever thought about in losing control to AI.

It would be potentially both defensive and aggressive just with that historical knowledge.


Liberty2012 OP t1_jadt7dr wrote

Yes, I think you nailed it here with this response. That aligns very closely with what I've called the Bias Paradox. Essentially humanity can not escape its own flaws through the creation of AI.

We will inevitably end up encoding our own flaws back into the system by one manner or another. It is like a feedback loop from which we can not escape.

I believe ultimately there is a very stark contrast of what visions people have of what "could be" versus the reality of what "will be".

I elaborate more thoroughly here FYI -


Liberty2012 OP t1_jadrb81 wrote

I don't think utopia is a possible outcome. It is a paradox itself. Essentially all utopias become someone else's dystopia.

The only conceivable utopia is one designed just for you. Placed into your own virtual utopia designed for you own interests. However, even this is paradoxically both a utopia and a prison as in welcome to the Matrix.


Liberty2012 OP t1_jadq1t1 wrote

Well, cage is simply metaphorical context. There must be some boundary conditions for behaviors which it is not allowed to cross.

Edit: I explain alignment in further detail in the original article. Mods removed it from original post, but hopefully it is ok to link in a comment. It was a bit much to put all in a post, but there was a lot of thought exploration on the topic.