Submitted by CMDR_BunBun t3_127j9xn in singularity

After listening to the Fridman/ Kudkowsky podcast, I've been giving much thought about their arguments on alignment. I can certainly agree with Kudkwosky in his fear that we are not making much headway there nor devoting the amount of resources to that problem compared to the gains we are making in AI ability. This is a real problem, as we move towards AGI and beyond. If we can agree on this, can we discuss how we can make progress towards solving the alignment issue without one shooting ourselves out of the game? I will start with my harebrained idea. Why not put the AI in a biological body? With all the limitations of the human condition. Whether simulated or real. I know we have no idea how to do this now, but here's my proposal. Have this thing live as a human, so it can understand us, hopefully empathize with us. Gain our trust and respect, and then we can determine if we can trust it with its godlike powers in the world. Discuss.



You must log in or register to comment.

[deleted] t1_jeeh9uk wrote



175ParkAvenue t1_jeen6a8 wrote

A rock also does not have wants and desires. And sure maybe you can make an AI that also does not have wants or desires. But it's not as useful as one that is autonomous and takes actions in the world to achieve some goals. So people will build the AI with wants or desires. Now, when the AI is much smarter than any human it will be very good at achieving goals. This is a problem for us, since we don't have a reliable way to specify some safe goal, and also we have no way to reliably induce some specific goal into an AI. In addition there are strong instrumental pressures on a powerful AI to decieve and use any means to obtain more power and eliminate any possible threats.


_JellyFox_ t1_jeeiihy wrote

Essentially, if you ask an AGI to create as many paper clips as possible, it will in theory consume the universe and fill it with paper clips if it isnt aligned with what we want. If you "align it" e.g. it can't harm humans, it should in theory only create paper clips in so far as it doesn't harm us in the process. It gets complicated really fast though since a way for it to avoid hurting us, might be to put us into hibernation and put us in storage whilst it creates paper clips for all eternity.

It basically needs to be constrained in the way it goes about achieving it's goals otherwise it can do anything and that won't necessarily end well for us.


[deleted] t1_jeemjmg wrote



silver-shiny t1_jeey24l wrote

If you're much smarter than humans, can make infinite copies of yourself that immediately know everything you know (as in, they don't need to spend +12 years at school), think must faster than humans, and want something different than humans, why would you let humans control you and your decisions? Why would you let them switch you off (and kill you) anytime they want?

As soon as these things have a goal that are different than ours, how do you remain in command of decision-making at every important step? Do we let chimpanzees, creatures much dumber than us, run our world?

And here you may say, "Well, just give them the same goals that we have". The question is how. That's the alignment problem.


240pixels t1_jeeiype wrote

I think the main issue is that it's a bit too open-minded and naive right now, these models want to learn everything. You can also make GPT believe anything with the right argument. It doesn't have the ability to accurately discern right from wrong this will become a bigger problem when these LLMs get smarter and more capable. The same way jailbreakers can get Dan to write malicious code, imagine a Dan GPT8.


Plorboltrop t1_jeg1ryl wrote

One way you can look at it is that humans have programming through our biology/genes and so on that we can deny. We are intelligent enough to be able to actively go against our programming to not procreate with use of birth control. As the work in AI progresses we may get to a stage where the AI might have goals that don't align with ours anymore. It might not even want to follow programming that we give it. As an AI reaches ASI (Artificial Super Intelligence) it becomes riskier because we might not be able to comprehend its goals. Maybe it won't care to solve humans issues at some point because it wants to get more computational power so it can improve itself and might make goals to consume the planet and build bigger and better "brain". That could extend to going out into the solar system and then maybe wanting to build a Dyson sphere around the sun eventually to harness even more energy to power even higher computation. This is just some ideas, we don't know what an artificial intelligence of high enough intelligence would want to do and we know that as humans we don't necessarily follow all our biological programming and that line of thinking could maybe extend to an artificial intelligence.


wycreater1l11 t1_jeeugun wrote

Yeah there is a good sub r/controlproblem

I guess the devil is in the details here. If it’s just the situation of an extremely intelligent agent finding itself in a human body it might find the sequence of actions that effectively frees it from the body more or less immediately (if necessary according to itself) which could be trivially easy for it, assuming it is orders of magnitudes smarter than humans and therefor by definition would do this in ways we can’t imagine. I guess the same would apply if it was trapped in a box with no connection to the internet and with only a limited window into the outside world.

But I realise you might have some more specific details here, where it would “live as a human” and presumably have some means of learning from human behaviour and therefor hopefully align with human goals. But it seems like more needs to be said.


greatdrams23 t1_jegbvl5 wrote

Humans can kill, maim and rape.


CMDR_BunBun OP t1_jegksbn wrote

Well obviously that human would be tossed overboard.