linearmodality

linearmodality t1_j9segxb wrote

I don't worry much at all about the AI safety/alignment concerns described by Eliezer Yudkowsky. I don't find his arguments to be particularly rigorous, and his arguments in this space are typically based on premises that are either nonsensical or wrong and that don't engage meaningfully with the current practice in the field. This is not to say that I do not worry about AI safety: Stuart Russell has done good work in this space towards mapping out the AI alignment problem. And if you're looking for arguments that are more rigorous leading to more sound conclusions on AI alignment and which people in the field do seem to respect, I'd recommend you look into Stuart Russell's work. The bulk of opinions I've seen from people in the field on the positions of Yudkowsky and his edifice range from finding the work to be of dubious quality (but tolerable) to judging it as actively harmful.

20

linearmodality t1_j4nvw4h wrote

This article seems incomplete. It does not reference us to any instances of anyone using the "Little Gods" argument except for Sam Harris, and in that case there's no indication of where specifically Harris uses it. It's unclear which versions of free will this argument is valid against. It's not stated how this argument relates to other arguments against free will, or what the counter-arguments might be. It's not even clear where the term "Little Gods Argument" comes from – is it just an invention of the author of this piece, or is it a categorization of arguments that was advanced by some prior work? The overall categorization seems very interesting, but there's nothing in this article that suggests that we're looking at anything more than a single poorly constructed argument vaguely alluded to once by Sam Harris.

4

linearmodality t1_iybbrz3 wrote

> There's an extremely obvious restricted range problem here

Then you're not talking about actual correlation over the distribution of actually extant intelligent agents, but rather about something else. In which case: what are you talking about?

>This is literally the Orthogonality Thesis stated in plain English.

Well, no. The orthogonality thesis asserts, roughly, that an AI agent's intelligence and goals are somehow orthogonal. Here, we're talking about an AI agent's intelligence and the difficulty of producing a specification for a given task that avoids undesirable side effects. "Goals" and "the difficulty of producing a specification" are hardly the same thing.

>I don't think that this solution will work.

This sort of approach is already working. On the one side we have tools like prompt engineering that automatically develop specifications of what an AI system should do, for things like zero-shot learning. On the other side we have robust control results which guarantee that undesirable outcomes are avoided, even when a learned agent is used as part of the controller. There's no reason to think that improvements in this space won't continue.

Even if they don't, the problem of producing task specifications does not get worse with AI intelligence (because as we've already seen, the difficulty of producing a specification is independent) which is fundamentally inconsistent with the LessWrongist viewpoint.

2

linearmodality t1_iyb8fij wrote

Well intelligence is correlated with willingness to do what people want. This is very straightforward to observe in natural intelligences. The most intelligent beings (adult humans) are the most willing to do what people want. This is also presently true for existing AI agents, if being "willing" even makes sense for such agents: the ones that possess better problem-solving abilities are more "willing" (because they are more able) to do what people want. This is so clearly the case that I suspect you mean something other than "correlated" here.

>It's fucking hard to specify what we want AI systems to do in ways that avoid undesirable side effects. Everyone agrees on this with respect to current AI. The only remaining question is whether we should expect it to become easier or harder to control machine intelligences as they become more sophisticated.

Well, that's the wrong question. Yes, it's hard to specify what we want a system to do in a way that avoids side effects. However, this hardness is a property of the specification, not of the learned model itself. It doesn't get harder or easier as the model becomes more accurate, because it's independent of the model.

>Do you personally, really and honestly, believe that it's so obvious that control will get easier as intelligence gets greater

Certainly it will get easier to produce specifications of what we want an AI system to do in a way that avoids undesirable side effects because we can get a sufficiently intelligent AI to write the specifications—and furnish us with a proof of safety (that the specification will guarantee that we avoid the undesirable side effects). "Control" is a more general word, though, and you'll have to nail down exactly what it means before we can evaluate whether we should expect it will get easier or harder over time.

>that you'd label people who worry otherwise as cultists?

Oh, LessWrongers aren't cultists because they worry otherwise. There are lots of perfectly reasonable non-cultists who worry otherwise, like Stuart Russell.

1

linearmodality t1_iyb29f2 wrote

And not everyone who participates in LessWrong is Eliezer Yudkowsky or Stuart Armstrong (even they themselves lack the general intellectual coherence of Bostrom). But even if everyone on LessWrong were Nick Bostrom himself, the core problem remains: the "orthogonality thesis" is fundamentally flawed. (It hides these flaws by being purposefully vague about how "goals" and "intelligence" are mapped to vectors and what the inner product space is. If you try to nail these things down the statement either becomes false, vacuous, or trivial.)

5

linearmodality t1_iyb0ojk wrote

An idea that is actually sound generally does not need to bolster its credibility by dubbing itself a "thesis" or by using unrelated technobabble (the notion of orthogonality here is nonsense: there's no objectively defined inner product space we're working in).

Of course, also the orthogonality thesis was not invented by lesswrong and lesswrong does a pretty poor job of representing Bostrom's work. So there are multiple issues here.

5