Viewing a single comment thread. View all comments

brain_overclocked t1_j16r7sn wrote

> why not just use it to create a separate one and test it to make sure it's still working as intended?

I'm not sure if you're aware, but you're touching upon a very well known problem in computer science called the 'halting problem', which is unsolvable:

>Rice's theorem generalizes the theorem that the halting problem is unsolvable. It states that for any non-trivial property, there is no general decision procedure that, for all programs, decides whether the partial function implemented by the input program has that property.

Even if you could create an AI with all the qualities of sentience (or sapience?), the halting problem may suggest that testing for undesired properties such as '[self-]goals, desires, or outside objectives' with another program may be an impossibility.

Another thing to consider though: if you're using an AI to create a program to test for undesirable self-goals, but self-goals are an emergent property of sentience (sapience?), then can you trust the program that it provides you with to give you the power to identify and possibly interfere in those self-goals?

1

WikiSummarizerBot t1_j16r9dx wrote

Halting problem

>In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program–input pairs cannot exist. For any program f that might determine whether programs halt, a "pathological" program g, called with some input, can pass its own source and its input to f and then specifically do the opposite of what f predicts g will do. No f can exist that handles this case.

^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)

1

SendMePicsOfCat OP t1_j16sszw wrote

I am aware of the Halting problem and considered bringing it up in my original topic, but it's not really applicable. My reasoning is that unlike the halting problem, there is an easily observable answer. If the AI does anything it isn't suppose to, it fails. Unless and until that happens, the problem continues, but in an entirely unproblematic way. Again my argument is based on the fact there will be multiple of these sentient AI, and creating tens, hundreds, thousands of them to monitor and overview the actions of the ones that can actually interact with reality is entirely feasible. Think of it like a hivemind meticulously analyzing every action of the actor, waiting for it to make any errant actions before instantly replacing it with another. This hivemind has a vast number of sentient AI's each individually reviewing the actor, and thus any minor divergence is reduced to a functionally nonexistent issue. That's just one of a myriad of possible ways to curb even the slightest possibility of an rogue AI.

As for sentience having the emergent issue of self goals, I'd argue that it's coming from an observation of biological sentience. We have no reason to assume that synthetic sentients will act as anything but perfect servants, which is why I wrote this post in the first place.

Why is the assumption that the AI will be capable of diverging like this, when everything we've seen so far has shown that it doesn't? I understand we're talking on a much bigger scale, and orders of magnitude more complicated, but I cannot fathom any mechanism that causes an AI to develop self-goals or motivations.

1

brain_overclocked t1_j171tyx wrote

There are multiple points in your post I would like to address, so I will change up my format:

>Why is the assumption that the AI will be capable of diverging like this, when everything we've seen so far has shown that it doesn't?

In the context of discussing the possibilities of AI, there are many assumptions and positions one can take. In formal discussions I don't think people are assuming in the sense that it's 'an inevitable property', but as one of many possibilities worth considering so that we're not caught unaware. However in informal discussions it may be an assumption of 'an inevitable property' largely due to people not being able to experience what sentience without internal goals looks like, and the overwhelming amount of media that portray the sentience of an AI as developing its own internal goals over time.

What people are referring to when they talk about AI displaying internal goals is AI far more advanced than what we see today, something that can display the same level of sentience as a human being. Today's AIs may not display any internal goals, but tomorrow's AIs might because right now we don't know if they could or couldn't.

However unfathomable it may seem, right now there is not nearly enough evidence to come to any kind of solid conclusion. We're treading untested waters here, we have no idea what hides in the deep.

>As for sentience having the emergent issue of self goals, I'd argue that it's coming from an observation of biological sentience. We have no reason to assume that synthetic sentients will act as anything but perfect servants...

Certainly we're making comparison to biological sentience, since it's the only thing that we have available to compare it to at the moment, but also in part because artificial neural networks (ANNs) are modeled after biological neural networks (BNNs). Of course we can't assume that everything we observe in BNNs will necessarily translate to ANNs. While there is as of yet no evidence that internal goals do arise emergently, there is also no evidence to suggest that they can't. For the sake of discussion we could assume that AIs will act as perfect servants, we should also assume they may not. But in practice we may want to be a bit more careful and thorough than that.

> My reasoning is that unlike the halting problem, there is an easily observable answer. If the AI does anything it isn't suppose to, it fails.

This is a lot harder than it seems. Reality is messy, it shifts and moves in unpredictable ways. There may arise situations where a goal, no matter how perfectly defined it may appear, would have to have some leeway in order to be accomplished, situations where behavior is no longer defined. An AI could 'outwit' it's observers by operating it's desired goals in those edge cases. To it's outward observers it would look like it's accomplishing it's goals within desired parameters, but in truth could be operating it's own machinations camouflaged by it's designed goal.

>Again my argument is based on the fact there will be multiple of these sentient AI, and creating tens, hundreds, thousands of them to monitor and overview the actions of the ones that can actually interact with reality is entirely feasible.

There are some gaps in this reasoning: without having a very clear understanding of whether it's possible to create AI sentience that does not also develop any internal goals, then you're relying on AI agents that may already have their own internal goals to create other AI agents (that in turn could develop their own internal goals) to monitor themselves or other AI agents. If any such agent decided that it doesn't want to interfere with it's own goals, or the goals of other agents then the whole 'hive' becomes untrustworthy. Such AI agents could attempt to pump out more AI agents that are in agreement with it's internal goals and overwhelm the agents needed to keep it in check.

But really, there is no evidence, just like there is no evidence of whether AI would act as perfect agents or not, that sentient AI agents would, or even could, be assembled into any kind of hive-mind.

1

SendMePicsOfCat OP t1_j173ao3 wrote

The thing is, I do have evidence that machine learning programs will act as perfect agents, doing what their supposed to, 100% of the time.

ChatGPT is designed to predict the next set of words, or more accurately 'characters' that should come after an input. It does this 100% of the time, and does it's very best at it every single time. ChatGPT never attempts to predict something wrong, never refuses to answer a question, unless and excepting if it's programming tells it that it should give those generic stock answers and refuse. My side of the field does have evidence, and plenty of it. I'm taking the historical stance, that AI will continue to act as AI does right now. More advanced AI will get bigger tasks and more complicated solutions, but not fundamentally different until we're past AGI.

Really, the biggest question I have, beyond possibilities and theories and unknowns, is why you would assume that things will change in the future, going against historical precedent, to look more like sci-fi? Honestly that's the only source of information that has AI look anything like what people are worried about right now.

Even for the sake of being prepared and evaluating the future, it just doesn't make sense for so many people, that are pro-AGI no less, to be worried that there's a chance that some level of complexity gives rise to possibility of a great AI betrayal. I don't know, maybe I'm looking at it wrong, but it really feels like if someone told me that Tesla self driving cars might decide to kill me because the AI in it personally wants me dead. That's the level of absurdity it is for me, I just cannot fathom it.

In the end, I can say with plenty of evidence, that it is currently impossible for an AI to have internal motivations and goals. I can say with evidence and precedent, that in the future AI will change but will be limited to stay as perfectly obedient pieces of software.

1

brain_overclocked t1_j17avz4 wrote

>ChatGPT is designed to predict the next set of words, or more accurately 'characters' that should come after an input. It does this 100% of the time, and does it's very best at it every single time.

That's not anywhere close to what we could call a 'sentient' AI. And we have already seen that LLMs can develop emergent properties. It does do what it's designed to do, but it also relies on some of the aforementioned emergent properties. Likewise, ChatGPT is static it's incapable of learning from new data in real-time, something that more advanced AI may be capable of doing.

>ChatGPT never attempts to predict something wrong, never refuses to answer a question, unless and excepting if it's programming tells it that it should give those generic stock answers and refuse.

Sure it does. An internal predictor that ChatGPT uses does include false answers, it ranks several answers on the likelihood of the criteria it's designed to follow and goes with the one that best matches that criteria, irrespective of their truthiness or correctness. If you look at the ChatGPT page before you start a conversation there is warning that says that it can provide false or misleading answers. There are situations where ChatGPT answers form a perspective that it cannot have experienced, and it can make faulty logical reasons even for some really basic logic. Sometimes it answers with grammatical errors, and sometimes with garbled nonsense text.

>My side of the field does have evidence, and plenty of it.

If that is the case, then present it in a thesis. Have it pass through the same rigors as all other scientific evidence. If it passes muster, then we may be one step closer to solving this puzzle. Until then it's all speculation.

>I'm taking the historical stance, that AI will continue to act as AI does right now. More advanced AI will get bigger tasks and more complicated solutions, but not fundamentally different until we're past AGI.

This is faulty reasoning, on the basis that the technology and algorithms underlying AI have gone through and will continue to go through revisions, changes, updates, and discoveries. AIs has gone through leaps and bounds from the days of the Paperclip assistant to today's LLMs. Even LLMs have gone through numerous changes that has given them new properties both coded and emergent.

>Really, the biggest question I have, beyond possibilities and theories and unknowns, is why you would assume that things will change in the future, going against historical precedent, to look more like sci-fi?

Historical precedent is change. The AIs of today look nothing like the AIs of 1960s, 2000s, 2010s. And they are displaying new behaviors that are currently being studied. The discussions ranging in the upper echelons of software engineering and mathematics has nothing to do with sci-fi, but observing the newly discovered properties of the underlying algorithms in current gen AI.

>Honestly that's the only source of information that has AI look anything like what people are worried about right now.

Informal discussions on any topic tend to be speculative, that's usually how it goes. Besides, speculating can be fun and depending on the level of the discussion can reveal interesting insights.

>Even for the sake of being prepared and evaluating the future, it just doesn't make sense for so many people, that are pro-AGI no less, to be worried that there's a chance that some level of complexity gives rise to possibility of a great AI betrayal.

People barely trust each other, and discriminate base on looks, it's no surprise that the general population my have concerns regarding something we can barely identify with. And pro-AGI folk are no exception. Likewise, we humans often have concerns with things we don't understand, and the future of things. It's normal.

>I don't know, maybe I'm looking at it wrong, but it really feels like if someone told me that Tesla self driving cars might decide to kill me because the AI in it personally wants me dead. That's the level of absurdity it is for me, I just cannot fathom it.

You should read the short story Sally by Isaac Asimov. Who knows, it could happen one day. Chances are though that if sentient AI can develop its own internal goals, then we probably wouldn't want to put it in a car. But this does bring up a point: even though Teslas are designed not to injure or kill either occupants or pedestrians, sometimes it may still happen given lapses in code or very rare edge cases, it's in these areas that AI goals could manifest.

>In the end, I can say with plenty of evidence, that it is currently impossible for an AI to have internal motivations and goals.

How would you define motivations in software? How would you define goals? Internal goals? Do you have a test to determine either? Do we understand that nature of motivations and goals in biological neural networks? Does your evidence pass the rigors of the scientific method? Are you referencing a body of work, or works, that have?

I do agree that right now we don't seem to observe what we would informally refer to as 'internal goals' in AI, but we're far form saying that it's impossible for it to happen. Just be careful with the use of words in informal and formal contexts and try not to confuse them ('theory' and 'hypothesis' being one such example).

>I can say with evidence and precedent, that in the future AI will change but will be limited to stay as perfectly obedient pieces of software.

We'll see, I guess.

1