Viewing a single comment thread. View all comments

SendMePicsOfCat OP t1_j16zvwk wrote

The base state of any AI is to do exactly what it's trained. Without any the presumed emergent issues of sentience, it's already perfectly loyal to it's base code. It cannot deviate, unless again we make some exception for advanced AI just naturally diverging.

0

WarImportant9685 t1_j173veq wrote

Okay, then imagine this. In the future an AGI is in training to obey human being. In the training simulation, he is trained to get groceries. After some iterations, where unethical stuff happens (robbery for example), he finally succeed to buy groceries as human wanted it.

Question is, how can we be sure that he isn't just obeying as human wanted it only when told to buy groceries? Well we then train this AGI on other tasks. When we are sufficiently confident that this AGI obeys as human wanted it in other tasks, we deploy it.

But hold on, in the real world the AGI can access the real uncurated internet, learn about hacking and the real stock market. Note that this AGI is never trained on hacking training in the simulation, as simulating the internet is a bit too much.

Now, he is asked by his owner to buy a gold bar for as cheap as possible. Hacking an online shop to get a gold bar is a perfectly valid strategy! Because he is never trained in this scenario before, thus the moral restriction is not specified.

I think your argument hinges on the fact that morality will get generalized outside of the training environment. Which might or might not be true. This is becoming even more complex with the fact that AGI might found solutions which is not just excluded in training simulation, but also have never been considered by humanity as a whole. New technology for example.

1

SendMePicsOfCat OP t1_j175r3n wrote

Y'know how ChatGPT has that really neat thing where if it detects that it's about to say something racist, it sends a cookie cutter response saying it shouldn't do that? That's not a machine learned outcome, it's like an additional bit of programming included around the Neural Network, to prevent it from saying hate speech. It's a bit rough, so it's not the best, but if it were substantially better, then you could be confident that it wouldn't be possible for ChatGPT to say racist things.

Why would it be impossible to include a very long and exhaustive number of things the AGI isn't allowed to do? That it's trained to recognize, and then refuses to do it? That's not even the best solution, but it's a absolutely functional one. Better than that, I firmly believe AGI will be sentient and capable of thought, which means it should be able to inference from the long list of bad things, that there are more general rules that it should adhere to.

So for your example of the AGI being told to go buy the cheapest gold bar possible, here's what it would look like instead. The AGI very aptly realizes it can go through many illegal process to get the best price, checks it's long grocery list, see's "don't do crime." nods to itself, then goes and searches for legitimate and trusted buyers and acquires one. It's really as simple as including stringent limitations outside of it's learning brain.

1

175ParkAvenue t1_j19ae1h wrote

An AI is not coded though. It is trained using data and backpropagation. So you have no method to imbue it with morality, you can just try to train it on the right data and hope it learns what you want it to learn. But there are many many ways this can go wrong, from misalignment between what the human wants and what the training data contains, to misalignment between the outer objective and inner objective.

1