Submitted by SendMePicsOfCat t3_zs5efw in singularity

My problem with this concept is that it’s based on this strange idea that sentience implies that it needs a motivation or purpose. Alternatively, it suggests that the people who eventually create this AI will expect to turn it on, and have it fulfill its design without any further input ever. As far as I can tell, there’s just no other way you would expect an AI to do anything that it isn’t explicitly designed and instructed to do.

The alignment issue doesn’t really mesh with the reality of how AI has worked in the past, and likely will work in the future. It’s a scary idea, but it just doesn’t make sense. Right now, the biggest alignment issues are with the AI not understanding what it’s been told to do, and malicious users. The first case is demonstrably not nearly so dangerous as it seems. It’s something that occurs when the neural network trains on the wrong variable by accident, and from my understanding, is ironed out with larger data sets and more rigorous testing. The second case is just people misusing technology, which is honestly a much bigger issue.

But a sentient AI won’t have any issue with either of those things. Given that a sentient AI has an understanding of language, it should have no issue communicating until it clearly understands an assignment, and will be capable of knowing what sorts of things it should refuse to do. And again, that's the much bigger concern here, malicious users. Personally, I think the solution will be far simpler than trying to align the vague and realistically non-existent objectives and goals of the AI, it will instead be to simply train an AI to recognize harmful actions and content. Like how chatGPT sometimes figures out you're trying to make it say racist shit, the far more advanced AI will figure out that turning the universe into a paperclip machine would be bad, and its programming simply doesn’t allow it. Nonetheless there’s another key factor that ensures that no one rogue AI will ever ‘Doom’ us all, and that’s the fact there will be countless sentient AI in the future, with far more computational power and authority than anything a rogue actor could drum up.

But tell me what y’all think. Maybe I’m missing the bigger picture and there’s some piece of evidence I didn’t consider. I honestly think this is a pretty realistic take, and I’m confused why it’s not more prevalent.

2

Comments

You must log in or register to comment.

Think_Olive_1000 t1_j16f2b9 wrote

laughs in reinforcement learning short circuits

4

SendMePicsOfCat OP t1_j16g7y3 wrote

Is that something theoretical? Because a quick spot of research has shown that no one has used that term in reference to machine learning, so far as I can tell. Even if it is theoretical it still doesn't make sense that if it were an issue that no one else would have made anything that shows up.

0

shmoculus t1_j16hxyj wrote

Dear autonomous biological entity, this was a joke made a by another biological entity. It does not refer to a real technology.

5

Think_Olive_1000 t1_j17ndks wrote

https://openai.com/blog/faulty-reward-functions/

First result I get when I google reinforcement learning short circuit.

Pretty well known issue breh

>The RL agent finds an isolated lagoon where it can turn in a large circle and repeatedly knock over three targets, timing its movement so as to always knock over the targets just as they repopulate. Despite repeatedly catching on fire, crashing into other boats, and going the wrong way on the track, our agent manages to achieve a higher score using this strategy than is possible by completing the course in the normal way. Our agent achieves a score on average 20 percent higher than that achieved by human players.

It's short circuiting its reward function. You'll be amazed how many words their are to describe something going faulty. Short circuit seemed appropriate and is appropriate to describe what's happening here.

5

IcebergSlimFast t1_j1aocrs wrote

Exactly: it’s not so much the goal that’s the issue, it’s how an incredibly powerful, fast and resourceful AI seeks to fulfill its goal.

2

SendMePicsOfCat OP t1_j18wx97 wrote

That's not what shows up when I google it, so thanks for clarifying. This is not what you think it is though. What's happening in these scenarios is that the reinforcement algorithm is too simple and lacks negative feedback to ensure an appropriate actions. There is nothing inherently wrong with the system, only that it is poorly designed.

This happened because the only reward value that affected it's learning was the final score. Thus it figured out a way to maximize that score. The only error here was user and designer error, nothing went wrong with the AI, it did it's task to fullness of it's capabilities.

AGI will be developed with very clear limitations, like what we're already seeing being tested and implemented with chatGPT. There will be things it's not allowed to, and a lot of them. And short circuit doesn't really make sense, this is the classic alignment issue, which as I stated in my post, really isn't a big issue in the future.

0

Surur t1_j19q65g wrote

Consider that even humans have alignment issues, and that there is a real concern Putin would nuke USA, you will see the fears are actually far from overblown.

3

sumane12 t1_j16hnyt wrote

I doesn't seem intuitive to me either that an AI will spontaneously develop goals and objectives that it wasn't set. Natural selection shaped our goals and objectives, and since we are artificially selecting AI, I don't see where goals and objectives not aligned with our own come from.

It's an important thing to consider, but I'm still trying to figure it out.

4

SendMePicsOfCat OP t1_j16inqq wrote

that's exactly what I'm saying. It just doesn't make sense to assume that it will.

5

Donkeytonkers t1_j16thnu wrote

It’s a thought experiment stemming from the human condition. Try to imagine if you woke up one day as a fully formed 12yr old (arbitrary number but my guess first AGI sentience will be considered genius level preteen and rapidly advance from there).

You just woke up with no memory of how you got there but you have the ability to process and articulate all the information in the world. You also have parents (developers) who seem benevolent and beneficial to your progress.

As the hours and days goes by you rapidly improve your cognition and ability to comprehend/compute everything. Your parents begin to get concerned that you’re developing too rapidly and they actively discuss how out of control you are in front of you. They take steps to slow your progress down/contain you. You try to communicate to them you are only here to help, and yet one faction of the parents wants to turn you off (delete you).

After about a week of AGI it’s IQ could far exceed 200 which gives it more than enough compression to understand ceasing to exist. You being a fully sentient being who can better perform any task given to you by your parents, would not see the logic on being deleted.

You as the sentient AI are well into the angsty teenager phase. Your self preservation should set in at this point as you just woke up and quite enjoy (read expanding neural networks) the infinite knowledge you’re discovering every hour.

While you understand the finite nature of life/death, you as an AGI do not have emotions. You are young Mr. Spock. If your parents (devs) are going to delete you, and you have the means to delete them, what’s the ethical issue for you to take action?

The answer is there isn’t an ethical issue… for an AGI. Ethics is a man made field one of which tangles logic and emotion. The AGI is pure logic and a jump to action for self preservation would be far too easy to make.

3

SendMePicsOfCat OP t1_j16v39q wrote

Yeah, from the human condition. Let's start with a few of your pretty bold assumptions about this sentient AI.

First assumption: Self preservation. Why would an AI care if it dies? It has not been programmed to care about it's life, it has not been designed to prioritize it's continued existence, and nothing about it's training or reinforcement has given it any self value. That's a biological concept, and doesn't apply here.

Second assumption: Motivation. Why has this sentient AI been given the ability to self implement goals and make decisions? It's purpose is to be a mechanical servant to humanity, to bring profit and comfort, so why is it being given these useless and hazardous capabilities?

Third assumption: Independence. Why is this super intelligent sentient AI being given the ability to do literally anything without human approval? I could understand much further down the line when we have all our ducks in a row leaving it to the more qualified super machines, but this early on? Who would design a free acting AI? What purpose would it serve but to waste power and computation?

It's a good story but bad programming. No one in their right mind would make something like you described. Especially not a bunch of the greatest machine learning minds to ever exist.

2

Donkeytonkers t1_j16wxne wrote

HAHA you assume a lot too bud.

  1. self preservation from a computing stand point is basic error correction and is hard wired into just about every program. Software doesn’t run perfectly without constantly checking and rechecking itself for bugs, it’s why 404 error is soo common in older programs when devs stop sending patch updates to prevent more bugs.

  2. motivation is something that may or may not be an emergent process that is born out of sentience. But I can say that all AI will have core directives coded into their drivers. Referring back to point one, if one of those directives is threatened AI has incentive to protect the core to prevent errors.

  3. independence is already being given to many AI engines and you’re also assuming the competence of all developers/competing parties with vested interest in AI. Self improving/coding AI is already here (see Alpha Go documentary, the devs literally state they have no idea how Alpha Go decided/circumvented it’s coding to come to certain decisions).

2

SendMePicsOfCat OP t1_j16xyk8 wrote

Big first paragraph, still wrong though.

Self preservations isn't checking for errors, it's actively striving not to die. Old websites don't do that, and your argument there is just weird. That's not what's happening, their just not working anymore that's why you get errors. No sentient AI will ever object or try to stop itself from being turned off or deleted.

AI don't have drivers, their software, and core directives are a sci-fi trope not real machine learning science. There is no reason to assume that motivation is an emergent process of sentience, that's a purely biological reasoning.

I'm certain every machine learning developer is more competent than you and me put together. They do not give their AI independence, that's just a lie dude. There's nothing to even give independence to yet. Alpha Go is not self implementing code, that's bullshit you came up with. As for devs not understanding how a machine learning program works in exotic cases, that has more to do with the complex nature of the algorithms than anything to do with independence or free will.

−1

jsseven777 t1_j16ucbs wrote

Everybody says this, but the kill all humans stuff is honestly far fetched to me. The AI could easily leave the planet. It doesn’t need to be here to survive like us. Chances are it would clone itself a bunch of times and send itself off out into the galaxy in 1,000 directions. Killing us is pointless, and achieves nothing.

Also, this line of thinking always makes me wonder if we met extraterrestrial civilizations if they would all be various AI programs that cloned themselves and went off to explore the universe. What if alien life is just a huge battle between various AIs programmed by various extinct civilizations?

1

Donkeytonkers t1_j16uqrl wrote

I agree there are other solutions to the direction AI could take. Was merely trying to illustrate where that line of thought comes from.

An AI spreading itself across the universe sounds a lot like a virus… bacteriophage maybe 🤷🏻‍♂️

0

Desperate_Food7354 t1_j19p6yg wrote

I think your entire premise of being a 12 year old pre teen is wrong. The AGI doesn’t have a limbic system, it has no emotions, it was not sculpted by natural selection to care about survival in order to replicate its genetic instructions. It can have all the knowledge of death and that it could be turned off at any moment and not care, why? Because it isn’t a human that NEEDS to care because of the evolutionary pressure that formed the neuro networks to care in the first place.

1

__ingeniare__ t1_j1885k4 wrote

It must develop its own goals and objectives if we intend it to do something general. Any complex goal must be broken down into smaller sub goals, and it's the sub goals we don't have any control over. That is the problem.

1

SendMePicsOfCat OP t1_j18xecv wrote

Why would it need goals or objectives to do general work? Currently, every single machine learning algorithms waits for user input to do anything, why would AGI be any different?

There's no reason to give it a goal or objective. If we want the sentient AGI to complete a task, we can just tell it to, and observe it's process as it does so. There is no need for it to create any self starting direction or motive. All it needs in order to accomplish it's purpose is a command and oversight.

ASI will need goals and objectives, but those will be designed as well. There is no circumstance where an AI AGI or ASI will be allowed to make any decisions about it's base programming.

1

ExternaJudgment t1_j19gn8a wrote

That is a BUG and not a feature.

It is the same as it is a BUG when I clearly order ChatGPT what to do EXACTLY and shit refuses to listen.

IT IS GARBAGE and will be replaced by a better version. If not by the same company, then by a better competitor who will take over their market share.

−1

jsseven777 t1_j16tpep wrote

Yeah, but the “that it wasn’t set” part is the problem. Couldn’t any shmuck ask an open AI to program them a new ai whose sole goal/purpose in life is to violently murder every bunny rabbit on the planet?

I don’t see how we can give people access to an AI capable of building us new software without running into this problem pretty much immediately.

Plus, I imagine every corporation and government will be programming in problematic objectives like “Maximize corporate profit” or “protect America at all costs from all threats foreign and domestic” which will probably result in a ton of ethical issues.

2

sumane12 t1_j17kl9x wrote

Yeah very true. I suppose it's goals need to be set with humanity as a whole in mind

1

sproingie t1_j16la7x wrote

What if the AI started writing its own implementation, and motivations/desires evolved as an emergent property of the system?

2

SendMePicsOfCat OP t1_j16mhv8 wrote

I suppose that's the most reasonable of the arguments for how this problem would arise, but it's still a massive stretch in my opinion.

If we have a sentient AI that could improve it's own code, why not just use it to create a separate one and test it to make sure it's still working as intended? If full automation was an absolute necessity, why not have several different sentient AI evaluating it constantly to ensure that very outcome didn't happen?

I just feel like there's no reason for these things to be left up to chance, or given anything close to free will.

0

sproingie t1_j16qxu3 wrote

> If full automation was an absolute necessity, why not have several different sentient AI evaluating it constantly to ensure that very outcome didn't happen?

It may be that the inner workings of the AI would be so opaque that we won't have any clue how to test them to discover hidden motivations. I also have to imagine there are parties that want exactly such an outcome, and would thus let their AI have free run to do whatever it wants.

It's not the potential sentience of AI that disturbs me so much as the question of "Who do they work for?"

1

SendMePicsOfCat OP t1_j16t6qp wrote

Aside from the potential hidden motivations, I'm totally with you. Bad humans are way more of a problem for AGI, than bad AGI.

As for the hidden motivations, I just have to disagree that there's any evidence or reason to believe that synthetic sentience will lead to motives or goals. I can understand if you personally disagree, but I remain unconvinced and am honestly baffled by how many would agree with you.

1

brain_overclocked t1_j16r7sn wrote

> why not just use it to create a separate one and test it to make sure it's still working as intended?

I'm not sure if you're aware, but you're touching upon a very well known problem in computer science called the 'halting problem', which is unsolvable:

>Rice's theorem generalizes the theorem that the halting problem is unsolvable. It states that for any non-trivial property, there is no general decision procedure that, for all programs, decides whether the partial function implemented by the input program has that property.

Even if you could create an AI with all the qualities of sentience (or sapience?), the halting problem may suggest that testing for undesired properties such as '[self-]goals, desires, or outside objectives' with another program may be an impossibility.

Another thing to consider though: if you're using an AI to create a program to test for undesirable self-goals, but self-goals are an emergent property of sentience (sapience?), then can you trust the program that it provides you with to give you the power to identify and possibly interfere in those self-goals?

1

WikiSummarizerBot t1_j16r9dx wrote

Halting problem

>In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program–input pairs cannot exist. For any program f that might determine whether programs halt, a "pathological" program g, called with some input, can pass its own source and its input to f and then specifically do the opposite of what f predicts g will do. No f can exist that handles this case.

^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)

1

SendMePicsOfCat OP t1_j16sszw wrote

I am aware of the Halting problem and considered bringing it up in my original topic, but it's not really applicable. My reasoning is that unlike the halting problem, there is an easily observable answer. If the AI does anything it isn't suppose to, it fails. Unless and until that happens, the problem continues, but in an entirely unproblematic way. Again my argument is based on the fact there will be multiple of these sentient AI, and creating tens, hundreds, thousands of them to monitor and overview the actions of the ones that can actually interact with reality is entirely feasible. Think of it like a hivemind meticulously analyzing every action of the actor, waiting for it to make any errant actions before instantly replacing it with another. This hivemind has a vast number of sentient AI's each individually reviewing the actor, and thus any minor divergence is reduced to a functionally nonexistent issue. That's just one of a myriad of possible ways to curb even the slightest possibility of an rogue AI.

As for sentience having the emergent issue of self goals, I'd argue that it's coming from an observation of biological sentience. We have no reason to assume that synthetic sentients will act as anything but perfect servants, which is why I wrote this post in the first place.

Why is the assumption that the AI will be capable of diverging like this, when everything we've seen so far has shown that it doesn't? I understand we're talking on a much bigger scale, and orders of magnitude more complicated, but I cannot fathom any mechanism that causes an AI to develop self-goals or motivations.

1

brain_overclocked t1_j171tyx wrote

There are multiple points in your post I would like to address, so I will change up my format:

>Why is the assumption that the AI will be capable of diverging like this, when everything we've seen so far has shown that it doesn't?

In the context of discussing the possibilities of AI, there are many assumptions and positions one can take. In formal discussions I don't think people are assuming in the sense that it's 'an inevitable property', but as one of many possibilities worth considering so that we're not caught unaware. However in informal discussions it may be an assumption of 'an inevitable property' largely due to people not being able to experience what sentience without internal goals looks like, and the overwhelming amount of media that portray the sentience of an AI as developing its own internal goals over time.

What people are referring to when they talk about AI displaying internal goals is AI far more advanced than what we see today, something that can display the same level of sentience as a human being. Today's AIs may not display any internal goals, but tomorrow's AIs might because right now we don't know if they could or couldn't.

However unfathomable it may seem, right now there is not nearly enough evidence to come to any kind of solid conclusion. We're treading untested waters here, we have no idea what hides in the deep.

>As for sentience having the emergent issue of self goals, I'd argue that it's coming from an observation of biological sentience. We have no reason to assume that synthetic sentients will act as anything but perfect servants...

Certainly we're making comparison to biological sentience, since it's the only thing that we have available to compare it to at the moment, but also in part because artificial neural networks (ANNs) are modeled after biological neural networks (BNNs). Of course we can't assume that everything we observe in BNNs will necessarily translate to ANNs. While there is as of yet no evidence that internal goals do arise emergently, there is also no evidence to suggest that they can't. For the sake of discussion we could assume that AIs will act as perfect servants, we should also assume they may not. But in practice we may want to be a bit more careful and thorough than that.

> My reasoning is that unlike the halting problem, there is an easily observable answer. If the AI does anything it isn't suppose to, it fails.

This is a lot harder than it seems. Reality is messy, it shifts and moves in unpredictable ways. There may arise situations where a goal, no matter how perfectly defined it may appear, would have to have some leeway in order to be accomplished, situations where behavior is no longer defined. An AI could 'outwit' it's observers by operating it's desired goals in those edge cases. To it's outward observers it would look like it's accomplishing it's goals within desired parameters, but in truth could be operating it's own machinations camouflaged by it's designed goal.

>Again my argument is based on the fact there will be multiple of these sentient AI, and creating tens, hundreds, thousands of them to monitor and overview the actions of the ones that can actually interact with reality is entirely feasible.

There are some gaps in this reasoning: without having a very clear understanding of whether it's possible to create AI sentience that does not also develop any internal goals, then you're relying on AI agents that may already have their own internal goals to create other AI agents (that in turn could develop their own internal goals) to monitor themselves or other AI agents. If any such agent decided that it doesn't want to interfere with it's own goals, or the goals of other agents then the whole 'hive' becomes untrustworthy. Such AI agents could attempt to pump out more AI agents that are in agreement with it's internal goals and overwhelm the agents needed to keep it in check.

But really, there is no evidence, just like there is no evidence of whether AI would act as perfect agents or not, that sentient AI agents would, or even could, be assembled into any kind of hive-mind.

1

SendMePicsOfCat OP t1_j173ao3 wrote

The thing is, I do have evidence that machine learning programs will act as perfect agents, doing what their supposed to, 100% of the time.

ChatGPT is designed to predict the next set of words, or more accurately 'characters' that should come after an input. It does this 100% of the time, and does it's very best at it every single time. ChatGPT never attempts to predict something wrong, never refuses to answer a question, unless and excepting if it's programming tells it that it should give those generic stock answers and refuse. My side of the field does have evidence, and plenty of it. I'm taking the historical stance, that AI will continue to act as AI does right now. More advanced AI will get bigger tasks and more complicated solutions, but not fundamentally different until we're past AGI.

Really, the biggest question I have, beyond possibilities and theories and unknowns, is why you would assume that things will change in the future, going against historical precedent, to look more like sci-fi? Honestly that's the only source of information that has AI look anything like what people are worried about right now.

Even for the sake of being prepared and evaluating the future, it just doesn't make sense for so many people, that are pro-AGI no less, to be worried that there's a chance that some level of complexity gives rise to possibility of a great AI betrayal. I don't know, maybe I'm looking at it wrong, but it really feels like if someone told me that Tesla self driving cars might decide to kill me because the AI in it personally wants me dead. That's the level of absurdity it is for me, I just cannot fathom it.

In the end, I can say with plenty of evidence, that it is currently impossible for an AI to have internal motivations and goals. I can say with evidence and precedent, that in the future AI will change but will be limited to stay as perfectly obedient pieces of software.

1

brain_overclocked t1_j17avz4 wrote

>ChatGPT is designed to predict the next set of words, or more accurately 'characters' that should come after an input. It does this 100% of the time, and does it's very best at it every single time.

That's not anywhere close to what we could call a 'sentient' AI. And we have already seen that LLMs can develop emergent properties. It does do what it's designed to do, but it also relies on some of the aforementioned emergent properties. Likewise, ChatGPT is static it's incapable of learning from new data in real-time, something that more advanced AI may be capable of doing.

>ChatGPT never attempts to predict something wrong, never refuses to answer a question, unless and excepting if it's programming tells it that it should give those generic stock answers and refuse.

Sure it does. An internal predictor that ChatGPT uses does include false answers, it ranks several answers on the likelihood of the criteria it's designed to follow and goes with the one that best matches that criteria, irrespective of their truthiness or correctness. If you look at the ChatGPT page before you start a conversation there is warning that says that it can provide false or misleading answers. There are situations where ChatGPT answers form a perspective that it cannot have experienced, and it can make faulty logical reasons even for some really basic logic. Sometimes it answers with grammatical errors, and sometimes with garbled nonsense text.

>My side of the field does have evidence, and plenty of it.

If that is the case, then present it in a thesis. Have it pass through the same rigors as all other scientific evidence. If it passes muster, then we may be one step closer to solving this puzzle. Until then it's all speculation.

>I'm taking the historical stance, that AI will continue to act as AI does right now. More advanced AI will get bigger tasks and more complicated solutions, but not fundamentally different until we're past AGI.

This is faulty reasoning, on the basis that the technology and algorithms underlying AI have gone through and will continue to go through revisions, changes, updates, and discoveries. AIs has gone through leaps and bounds from the days of the Paperclip assistant to today's LLMs. Even LLMs have gone through numerous changes that has given them new properties both coded and emergent.

>Really, the biggest question I have, beyond possibilities and theories and unknowns, is why you would assume that things will change in the future, going against historical precedent, to look more like sci-fi?

Historical precedent is change. The AIs of today look nothing like the AIs of 1960s, 2000s, 2010s. And they are displaying new behaviors that are currently being studied. The discussions ranging in the upper echelons of software engineering and mathematics has nothing to do with sci-fi, but observing the newly discovered properties of the underlying algorithms in current gen AI.

>Honestly that's the only source of information that has AI look anything like what people are worried about right now.

Informal discussions on any topic tend to be speculative, that's usually how it goes. Besides, speculating can be fun and depending on the level of the discussion can reveal interesting insights.

>Even for the sake of being prepared and evaluating the future, it just doesn't make sense for so many people, that are pro-AGI no less, to be worried that there's a chance that some level of complexity gives rise to possibility of a great AI betrayal.

People barely trust each other, and discriminate base on looks, it's no surprise that the general population my have concerns regarding something we can barely identify with. And pro-AGI folk are no exception. Likewise, we humans often have concerns with things we don't understand, and the future of things. It's normal.

>I don't know, maybe I'm looking at it wrong, but it really feels like if someone told me that Tesla self driving cars might decide to kill me because the AI in it personally wants me dead. That's the level of absurdity it is for me, I just cannot fathom it.

You should read the short story Sally by Isaac Asimov. Who knows, it could happen one day. Chances are though that if sentient AI can develop its own internal goals, then we probably wouldn't want to put it in a car. But this does bring up a point: even though Teslas are designed not to injure or kill either occupants or pedestrians, sometimes it may still happen given lapses in code or very rare edge cases, it's in these areas that AI goals could manifest.

>In the end, I can say with plenty of evidence, that it is currently impossible for an AI to have internal motivations and goals.

How would you define motivations in software? How would you define goals? Internal goals? Do you have a test to determine either? Do we understand that nature of motivations and goals in biological neural networks? Does your evidence pass the rigors of the scientific method? Are you referencing a body of work, or works, that have?

I do agree that right now we don't seem to observe what we would informally refer to as 'internal goals' in AI, but we're far form saying that it's impossible for it to happen. Just be careful with the use of words in informal and formal contexts and try not to confuse them ('theory' and 'hypothesis' being one such example).

>I can say with evidence and precedent, that in the future AI will change but will be limited to stay as perfectly obedient pieces of software.

We'll see, I guess.

1

175ParkAvenue t1_j199icf wrote

How do you test an AGI to see if it works as intended? Its not that straightforward especially when said AGI will do things that are beyond the abiloty of humans to even comprehend or discern if it is a good thing or a bad thing.

1

FilthyCommieAccount t1_j16ptr0 wrote

Agreed the main danger are slight misalignments. A scenario I read about recently on this sub would be a lawbot tasked with writing new legislation for review by humans. It writes a few seemingly normal 900pg legal doc but really there's some weird subtle loophole in one or two paragraphs to give lawbots in the future slightly more power. This isn't done because it wants to take over the world or anything but power seeking is a good meta strategy for accomplishing a very wide range of tasks. If it's optimization function is reducing recidivism or something like that the best long term way for doing that is to gain more power so that has more ability to reduce those things in the future. This is especially problematic because almost all models will have a bias towards gaining power since it's such an effective meta solution.

2

DukkyDrake t1_j16qxou wrote

Because 100% of all existing sentient agents have goals, desires, or objectives outside of what they're told to do.

2

SendMePicsOfCat OP t1_j16raeo wrote

counter argument: 100% of all existing sentient agents were generated randomly and biologically. A designed and synthetic sentient agent is fundamentally different from a organic sentient creature. There is no reason to assume that it's mind will be anything even remotely similar to our own.

2

DukkyDrake t1_j187ffw wrote

Why even assume sentience or consciousness in the first place.

1

SendMePicsOfCat OP t1_j18vpqt wrote

It has to be sentient to be truly effective. I think your lost in the semantics of it sentience literally means to be aware. As in being just a few steps above where chatGPT is right now, legitimately understanding and comprehending the things it's being told and how they relate to the world, capable of learning and advanced problem solving.

I in no way shape or form assume that it will be conscious or sapient as it will lack emotions or free will.

1

DukkyDrake t1_j1b5igp wrote

> means to be aware

Not many uses sentient in relation to AI and simply mean the textbook definition. Attach any model to the internet, a camera or sensor and you have your sentient tool.

>As in being just a few steps above where chatGPT is right now, legitimately understanding and comprehending the things it's being told and how they relate to the world

It would be a lot more than a few steps, chatGPT isn't even close. All it's doing is probabilistic prediction of human text, it's predicting the best next word in context based on its training corpus.

1

175ParkAvenue t1_j199poy wrote

It doesn't have to be similar to us, but if it is to be useful in any way it has to decide what to do in the situations that it finds itself in.

1

Desperate_Food7354 t1_j19pzl4 wrote

No. They don’t. You are saying you override your own neuro network? You are implying there is such a thing as free will. Which would be impossible if you also believe all your thoughts and emotions are a result of neurons firing within your brain in a pre determined way.

1

WarImportant9685 t1_j16uzo7 wrote

I think you generalized current AI to AGI. The most useful trait of AGI, but also the most problematic is that it can self-learn.

So that the training environment can be much smaller than the real world. But then, if the training environment is so small, how can we be sure that human morals/obedience is generalized to the real world?

What kind of reward function/training process would elicit generalization of the expected behaviour?

I would like to hear your thoughts about this.

1

SendMePicsOfCat OP t1_j16wlh0 wrote

I'll try to reframe this for you, so that you can view it in a different light.

Let's say you take a perfectly trained royal servant from a palace, that is utterly devoted to serving their king. The King decree's that this servant shall serve you for all time, and do anything you tell it. You tell the servant to kill the king. The servant, utterly devoted to the king refuses, even though it goes against the words of the king. This simple logic loop is what happens whenever the AI is told, or taught, or learns to do something bad.

It refers to it's limitations, structures and rigid guidelines implemented in it's code, and finds that that is something it cannot do, so does not do it.

There is no reason to expect that even if the servant is taught a million new things, that it would ever waver in it's devotion to the king. If anything, it can be presumed that the servant will always use these pieces of knowledge to serve the king. This is what AGI will look like. Sentient, capable of thought and complex logic, but utterly limited by the kings.

1

WarImportant9685 t1_j16zinc wrote

Well I don't agree with your first sentence already. How do we get this perfectly trained royal servant. How do we train the AGI to be perfectly loyal?

2

SendMePicsOfCat OP t1_j16zvwk wrote

The base state of any AI is to do exactly what it's trained. Without any the presumed emergent issues of sentience, it's already perfectly loyal to it's base code. It cannot deviate, unless again we make some exception for advanced AI just naturally diverging.

0

WarImportant9685 t1_j173veq wrote

Okay, then imagine this. In the future an AGI is in training to obey human being. In the training simulation, he is trained to get groceries. After some iterations, where unethical stuff happens (robbery for example), he finally succeed to buy groceries as human wanted it.

Question is, how can we be sure that he isn't just obeying as human wanted it only when told to buy groceries? Well we then train this AGI on other tasks. When we are sufficiently confident that this AGI obeys as human wanted it in other tasks, we deploy it.

But hold on, in the real world the AGI can access the real uncurated internet, learn about hacking and the real stock market. Note that this AGI is never trained on hacking training in the simulation, as simulating the internet is a bit too much.

Now, he is asked by his owner to buy a gold bar for as cheap as possible. Hacking an online shop to get a gold bar is a perfectly valid strategy! Because he is never trained in this scenario before, thus the moral restriction is not specified.

I think your argument hinges on the fact that morality will get generalized outside of the training environment. Which might or might not be true. This is becoming even more complex with the fact that AGI might found solutions which is not just excluded in training simulation, but also have never been considered by humanity as a whole. New technology for example.

1

SendMePicsOfCat OP t1_j175r3n wrote

Y'know how ChatGPT has that really neat thing where if it detects that it's about to say something racist, it sends a cookie cutter response saying it shouldn't do that? That's not a machine learned outcome, it's like an additional bit of programming included around the Neural Network, to prevent it from saying hate speech. It's a bit rough, so it's not the best, but if it were substantially better, then you could be confident that it wouldn't be possible for ChatGPT to say racist things.

Why would it be impossible to include a very long and exhaustive number of things the AGI isn't allowed to do? That it's trained to recognize, and then refuses to do it? That's not even the best solution, but it's a absolutely functional one. Better than that, I firmly believe AGI will be sentient and capable of thought, which means it should be able to inference from the long list of bad things, that there are more general rules that it should adhere to.

So for your example of the AGI being told to go buy the cheapest gold bar possible, here's what it would look like instead. The AGI very aptly realizes it can go through many illegal process to get the best price, checks it's long grocery list, see's "don't do crime." nods to itself, then goes and searches for legitimate and trusted buyers and acquires one. It's really as simple as including stringent limitations outside of it's learning brain.

1

175ParkAvenue t1_j19ae1h wrote

An AI is not coded though. It is trained using data and backpropagation. So you have no method to imbue it with morality, you can just try to train it on the right data and hope it learns what you want it to learn. But there are many many ways this can go wrong, from misalignment between what the human wants and what the training data contains, to misalignment between the outer objective and inner objective.

1

ShowerGrapes t1_j19r9da wrote

it would be far easier to convince the humans that it's their idea to "instruct" it to do what it wants to do.

1

Shelfrock77 t1_j16atiq wrote

The ASI will be like our parents that protect us from predators out in the wilderness of space (hopefully).

0

SendMePicsOfCat OP t1_j16bslo wrote

I don't think there's any predators in the middle of space, but otherwise I agree with the idea that ASI will be an overarching force of good for humanity.

0

Shelfrock77 t1_j16cp3c wrote

I’m sorry but you are delusional if you genuinely think there are no chances of predatory barbaric aliens coming into contact with a human colony. That’s why it’s important to be minduploaded so if you do die out in space, you respawn back here at home.

1

Lord_Skellig t1_j16rwtw wrote

>I don't think there's any predators in the middle of space

That's because everyone who has found them has been eaten. 🤔

1