Viewing a single comment thread. View all comments

AsheyDS t1_j0mm9q5 wrote

I don't believe the control problem is much of a problem depending on how the system is built. Direct modification of memory, seamless sandboxing, soft influence and hard behavior modification, and other methods should suffice. However I consider alignment to be a different problem, relating more to autonomy.

Aligning to humanity means creating a generic universally-accepted model of ethics, behavior, etc. But aligning to a user means it only needs to adhere to the laws of the land and whatever the user would 'typically' decide in an ethical situation. So an AGI (or whatever autonomous system we're concerned about here) would need to learn the user and their ethical preferences to aid in decision-making when the user isn't there or if it's unable otherwise unable to ask for clarification on an issue that arises.

If AGI were presented to everyone as a service they can access remotely, then I would assume alignment concerns would be minimal if it's carrying out work that doesn't directly impact others. For an autonomous car or robot that could have an impact on other people without user input, that's when it should consider how it's aligned with the user or owner, and how the user would want it to behave in an ethical dilemma. So yes, it should probably run imaginative scenarios much like people do, to be prepared, and to solidify the ethical stances it's been imbued with from the user.

3

WarImportant9685 t1_j0muh8q wrote

I do hope I share your optimism. But from the research I read, it seems that even the control problem seems to be a hard problem for us right now. As a fellow researcher what makes you personally feel optimistic that it'll be easy to solve?

I'll try to take a shot why I think the solution you said, is likely to be moot.

Direct modification of memory -> This is an advantage yes. But it's useless if we don't understand the AI in the way that we want. For the holy grail ideally we can understand if the AI is lying by looking at the neural weights. Or maybe searching with 100% certainty if the AI have mesa-optimizer for its subroutine. But our current AI interpretability research is still so far away from that.

Seamless sandboxing -> I'm not sure what you mean by this. But if I was to take a shot, I'll interpret this as true simulation of the real world. Which is impossible! My reasoning is that, the real world doesn't only contain garden, lake, and atom interactions. But also tons of human doing what the fuck they usually did. The economics and so on and on. What we can get is only 'close enough' simulation. But how do we define close enough? No one knows how to define this rigorously

Soft influence -> Not sure what you mean by this

Hard behavior modification -> I'll interpret this as hard rules for the AI to follow? Not gonna work. There is a reason why we are moving on from expert systems to AI. And we want to control AI with expert systems?

And anyway, I do want to hear your reply as a fellow researcher. Hopefully I don't come across as rude

1

AsheyDS t1_j0n9xiu wrote

>This is an advantage yes. But it's useless if we don't understand the AI in the way that we want.

Of course, but I don't think making black boxes is the only approach. So I'm assuming one day we'll be able to intentionally make an AGI system, not stumble upon it. If it's intentional, we can figure it out, and create effective control measures. And out of the control measures possible, I think the best option is to create a process, even if it has to be a separate embedded control structure, that will recognize undesirable 'thoughts' and intentions, and have it modify both the current state and memories leading up to it, and re-stitch things in a way that will completely obliterate the deviation.

Another step to this would be 'hard' behavior modification, basically reinforcement behaviors that lead it away from detecting and recognizing inconsistencies. Imagine you're out with a friend and you're having a conversation, but you forgot what you were just about to say. Then your friend distracts you and you forget completely, then you forget that you forgot. And it's gone, without thinking twice about it. That's how it should be controlled.

And what I meant by sandboxing is just sandboxing the short-term memory data, so that if it has a 'bad thought' which could lead to a bad action later, the data would be isolated before it writes to any long-term memory or any other part that could influence behavior or further thought chains. Basically a step before halting it and re-writing it's memory, and influencing behavior. Soft influence would be like your conscience telling you you probably shouldn't do a thing or think a thing, which would be the first step in self-control. The difference is, the influence would come from the embedded control structure (a sort of hybridized AI approach) and would 'spoof' the injected thoughts to appear the same as the ones generated by the rest of the system.

This would all be rather complex to implement, but not impossible, as long as the AGI system isn't some nightmare of connections we can't even begin to identify. You claim Expert systems or rules-based systems are obsolete, but I think some knowledge-based system will be at least partially required for an AGI that we can actually control and understand. Growing one from scratch using modern techniques is just a bad idea, even if it's possible. Expert systems only failed as an approach because of their limitations, but frankly I think they were given up on took quickly. Obviously on it's own it would be a failure because it can't grow like we want it to, but if we updated it with modern approaches and even a new architecture, then I don't see why it should be a dead-end. Only the trend of developing them died. There are a lot of approaches out there and just because one method is now popular while another isn't, doesn't mean a whole lot. AGI may end up being a mashup of old and new techniques, or may require something totally new. We'll have to see how it goes.

1

WarImportant9685 t1_j0ngwf5 wrote

I understand your point. Although we are not on the same page, I believe we are on the same chapter.

I think my main disagreement is that to recognize undesirable 'thoughts' in AI is not such an easy problem. As from my previous comments, one of the holy grail of AI interpretation study is detecting a lying AI which mean we are talking about the same thing! But you are more optimistic than I do, which is fine.

I also understand that we might be able design the AI to use less black-boxy structure to aid AI interpretation. But again I'm not too optimistic about this. I just have no idea how it can be achieved. As at a glance it seems like they are on different abstraction levels. Like if we are just designing the building blocks. How can we dictate how it is going to be used.

Like how are you supposed to design lego blocks, so that it cannot be used to create dragons.

Then again, maybe I'm just too doomer, as alignment problem is unsolved, AGI haven't been solved too. So I agree with you, we'll have to see how it goes.

1