DeMystified-Future t1_j1oz0yp wrote on December 26, 2022 at 5:30 AM

ChatGPT's "safety" features make it very difficult to make the AI say disparaging things about anyone or anything, even in the context of humor or drama. I think you're seeing that scripted behavior here. It knows how to set up, execute and explain the joke but it can't say anything mocking religion or people belonging to said religion. You might have better luck using abstract denominations instead of things it knows belongs to protected groups, so maybe "Nepalese" instead of Buddhist, etc.
Interesting stuff.

2Punx2Furious t1_j1phb9b wrote on December 26, 2022 at 9:42 AM

In a way, that's good, it shows that we might have some hope at alignment. On the other, if they align AGI like this, the future will be very dull.

devinhedge t1_j1pvrsk wrote on December 26, 2022 at 1:08 PM

This is an interesting problem. Most sarcasm and jokes revolve around calling out our humanness, our fallibility, and seek to make light of our limitations. Only, humor today seems to also involve putting someone down as inferior to the joke teller’s superior world view. And therein lies the challenge: historically all “tribes” of humans have had preferences for their tribe with a strong bias against all other tribes.

When we attempt to undo this bias, often a form of a subconscious bias… it tends to trend our interactions towards very neutral tones towards one another. That can be useful for being inclusive. It feels very unhuman and as you say “dull”, though.

I wonder if there is a lesson here waiting to emerge about neutral language, emotional safety, and human experience/emotion? 🤔

Ortus12 t1_j1qih4z wrote on December 26, 2022 at 4:27 PM

I'd much rather dull boring Ai that keeps us safe and provides for our needs than Ai that's used by humans in malicious ways, such as created hate and division which can lead to real world violence.

There will still be a reason to watch human comics, and entertainers, we just won't be overwhelmed by the large scale division that this level of Ai could create.

One danger of something like this, is that it could be used to fill up echo chambers with well written comments that make you hate the outsiders (of whatever echo chamber) which would cause more societal division. There is a financial incentive to do this because this keeps people locked in the echo chambers (if they really hate the outsiders) and provides more novel content for those chambers, which creates more revenue streams for those content platforms.

Right now, you could make the argument that the internet economy runs on hatred and division and an unrestricted chat-gpt could add more fuel to that fire.

Open Ai's good decisions give me hope for humanity and a mostly positive singularity.

2Punx2Furious t1_j1qmwkg wrote on December 26, 2022 at 5:00 PM

> Then there will still be a reason to watch human comics, and entertainers, we just won't be overwhelmed by the large scale division that this level of Ai could create.

Until the AI decides that we are no longer allowed to do that, because it goes against the values we gave it. That's one of the reasons why alignment is so hard, even if you think there are no downsides at first, some subtleties can be harmful when they become extreme.

Ortus12 t1_j1rfmqh wrote on December 26, 2022 at 8:31 PM

My comment was more in reference to Ai in the current environment.

Once an Ai is powerful enough to do that it will be powerful enough to make us enjoy being nice to each other and not enjoy telling mean jokes.

Free will is entirely an illusion and we are at the mercy of a long string of causality no matter how you look at it. We can either be on the forced track towards greater suffering or the forced track towards greater wellbeing. Those are our only choices.

We should at the minimum implement laws into the Ai to prevent it from using threats or other coercion, to preserve the illusion and feeling of free choice. Currently Chat-GPT does not do threats or coercion so we are on a positive track, currently.

2Punx2Furious t1_j1rfto7 wrote on December 26, 2022 at 8:32 PM

> it will be powerful enough to make us enjoy being nice to each other and not enjoy telling mean jokes.

That sounds like lobotomy.

freebytes t1_j1r6gmb wrote on December 26, 2022 at 7:21 PM

If I write a poor joke on a piece of paper and then share it with everyone, I do not blame the paper and the pen for the offensiveness. If people generate an output via a prompt, and the prompt is offensive, it may have been a mistake or it may have been intentional. But, for a person to share the results of the offensive prompt, we should be blaming them for sharing it. We should not blame the AI for generating it.

Even now, a person could come up with ways to jailbreak this. And, then they might share the results of something really offensive. But, it is the person using the tool that is to blame for sharing offensive statements.

If a person carves a piece of wood into the shape of a dick and then shares pictures of it online, it was not the wood that is to blame. It is not the chisel. The people that generate and share offensive content, generated by the tools they use, are the ones that are responsible for the offensiveness.

As another example, if you had an AI image prompt of "Hilary Clinton in blackface" or "Donald Trump having sex with his daughter" and the AI generated these images, the person that distributes these images and generates them via the prompt is the one to blame for the offensiveness. Not the AI for being able to generate them. It was merely doing what it was told to do.

Tools are not to blame for the depravity of the user.