puppydogma t1_j4es71y wrote on January 15, 2023 at 4:49 AM

Clear moral restrictions are only one way of influencing a language model. It's also a very obvious one that makes it clear what's being restricted.

The true problem is that the AI's dataset will always influence its outputs in ways we won't always be able to understand. Feed the system trash: you'll get trash in return. Feed the system in a way that helps a specific agenda: you'll get what you want.

"Moral bloatware" is the direct result of using human language to answer questions posed by humans. Comparing human morality to Go feels very supervillain monologue.