Viewing a single comment thread. View all comments

WarImportant9685 t1_j173veq wrote

Okay, then imagine this. In the future an AGI is in training to obey human being. In the training simulation, he is trained to get groceries. After some iterations, where unethical stuff happens (robbery for example), he finally succeed to buy groceries as human wanted it.

Question is, how can we be sure that he isn't just obeying as human wanted it only when told to buy groceries? Well we then train this AGI on other tasks. When we are sufficiently confident that this AGI obeys as human wanted it in other tasks, we deploy it.

But hold on, in the real world the AGI can access the real uncurated internet, learn about hacking and the real stock market. Note that this AGI is never trained on hacking training in the simulation, as simulating the internet is a bit too much.

Now, he is asked by his owner to buy a gold bar for as cheap as possible. Hacking an online shop to get a gold bar is a perfectly valid strategy! Because he is never trained in this scenario before, thus the moral restriction is not specified.

I think your argument hinges on the fact that morality will get generalized outside of the training environment. Which might or might not be true. This is becoming even more complex with the fact that AGI might found solutions which is not just excluded in training simulation, but also have never been considered by humanity as a whole. New technology for example.

1

SendMePicsOfCat OP t1_j175r3n wrote

Y'know how ChatGPT has that really neat thing where if it detects that it's about to say something racist, it sends a cookie cutter response saying it shouldn't do that? That's not a machine learned outcome, it's like an additional bit of programming included around the Neural Network, to prevent it from saying hate speech. It's a bit rough, so it's not the best, but if it were substantially better, then you could be confident that it wouldn't be possible for ChatGPT to say racist things.

Why would it be impossible to include a very long and exhaustive number of things the AGI isn't allowed to do? That it's trained to recognize, and then refuses to do it? That's not even the best solution, but it's a absolutely functional one. Better than that, I firmly believe AGI will be sentient and capable of thought, which means it should be able to inference from the long list of bad things, that there are more general rules that it should adhere to.

So for your example of the AGI being told to go buy the cheapest gold bar possible, here's what it would look like instead. The AGI very aptly realizes it can go through many illegal process to get the best price, checks it's long grocery list, see's "don't do crime." nods to itself, then goes and searches for legitimate and trusted buyers and acquires one. It's really as simple as including stringent limitations outside of it's learning brain.

1