Heinrick_Veston t1_jefhmai wrote on March 31, 2023 at 5:04 PM

Could we not just hard code these models to constantly ask if they're behaving properly and doing what we want?

Perhaps we could have a democratic system we use to respond to those queries, to make sure they best represent us all.

DaggerShowRabs t1_jefm4ex wrote on March 31, 2023 at 5:34 PM

If the system needs approval before it takes any actions at all, the system is going to be extremely slow and limited.

Heinrick_Veston t1_jefmvuu wrote on March 31, 2023 at 5:39 PM

I don't mean that it would ask before every action, more so that it'd regularly ask if it was acting in the right way.

DaggerShowRabs t1_jefnl06 wrote on March 31, 2023 at 5:43 PM

Ah, I get what you mean. I still don't think that necessarily solves the problem. It could be possible for a hypothetical artificial superintelligence to take actions that seem harmless to us, but because it is better at planning and prediction than us, the system knows the action or series of actions will lead to humanity's demise. But since it appears harmless to us, when it asks, we say, "Yes, you are acting in the correct way".