Viewing a single comment thread. View all comments

Molnan t1_j9l3q5s wrote

There's no need to "align an AGI". That's in fact the whole point, and you missed it.

2

Present_Finance8707 t1_j9l5tzs wrote

Two problems. It doesn’t work and the current models are already way down the Agent line and there’s no going back. Yawn. https://gwern.net/tool-ai

3

Molnan t1_j9lag3a wrote

From skimming through your blog post it's quite clear you really need to read and try to understand Drexler's FHI report. For instance, your claims about tool AIs Vs agent AIs are irrelevant because the idea is not to avoid agent AIs, only the "friendly AI" paradigm. Also, you'd know that Drexler's paradigm is a natural extension to current best practices in AI design, not just for some abstract big-picture AI security but also for expediency in development of AI capabilities and more mundane safety concerns. So it's the exact opposite of what you claim: the "friendly AI" paradigm is the alien, unwelcome newcomer that wants to turn the AI research community on its head for dubious reasons, while Drexler tells them to keep doing what they are doing.

2

Present_Finance8707 t1_j9myqs2 wrote

If you don’t even know who Gwern is I can’t really take you seriously about alignment. You can’t possibly have a deep understanding of the various arguments in play.

2

CellWithoutCulture t1_j9norym wrote

And if you don't know who Drexler is...

I know who all these people are, yet I don't know anything lol

4

Molnan t1_j9n7sql wrote

You don't have to take *me* seriously, but you should certainly read an FHI technical report before you take the liberty to yawn at it.

I don't keep up with every blogger who writes about AI alignment (which you stubbornly keep assuming to be the crux of all AI security) but I've been reading Eliezer and Nick Bostrom for long enough to know that their approach can't work, and now Eliezer seems to agree with that conclusion.

3

Present_Finance8707 t1_j9q8x1g wrote

Eliezers actual conclusion is that no current approach can work and there are none on the horizon that can.

1

Molnan t1_j9qd93i wrote

Yes, which implies he doesn't believe his approach would work, like I said.

1

Present_Finance8707 t1_j9qfavu wrote

His arguments don’t hold up. For one thing we already have powerful generalist agents. Gato is one and it’s clear that advanced LLMs can do all sorts of tasks they weren’t trained to. Prediction of next token seems as benign and narrow as it can get but if you don’t think a LLM can become dangerous you aren’t thinking hard enough. CAIS also assumes people won’t build generalist agents to start with but that cat is well out of the bag. Narrow agents can also become dangerous on their own because of instrumental convergence but even if you restrict building only weak narrow agents/services the profit incentive for building general agents will be too strong since they will likely outperform narrow ones.

1

Molnan t1_ja5zobe wrote

You say:

​

>CAIS also assumes people won’t build generalist agents to start with.

​

No, it doesn't. See, for instance, section 7: "Training agents in human-like environments can provide useful, bounded services":

​

>Training agents on ill-defined human tasks may seem to be in conflict with developing distinct services provided by agents with bounded goals. Perceptions of conflict, however, seem rooted in anthropomorphic intuitions regarding connections between human-like skills and human-like goal structures, and more fundamentally, between learning and competence. These considerations are important to untangle because human-like training is arguably necessary to the achievement of important goals in AI research and applications, including adaptive physical competencies and perhaps general intelligence itself. Although performing safely-bounded tasks by applying skills learned through loosely-supervised exploration appears tractable, human-like world-oriented learning nonetheless brings unique risks.

​

You say:

​

>if you don’t think a LLM can become dangerous you aren’t thinking hard enough.

​

Any AI can be dangerous depending on factors like its training data, architecture and usage context. That said, LLM as currently understood have a well defined way to produce and compare next token candidates, and no intrinsic tendency to improve on this routine by gathering computing resources or any similar instrumental goals, and simply adding more computing power and training data doesn't change that.

Gato and similar systems are interesting but at the end of the day, the architecture behind useful real-world AIs like Tesla's Autopilot is more suggestive of CAIS than of Gato, and flexibility, adaptability and repurposing are achieved through good old abstraction and decoupling of subsystems.

The advantages of generalist agents are derived from transfer learning. But this is no panacea, for instance, in the Gato paper they admit it didn't offer much advantage when it comes to playing Atari games, and it has obvious costs and drawbacks. For one, the training process will tend to be longer, and when something goes wrong you may need to start over from scratch.

And I must say, if I'm trusting an AI to drive my car, I'd actually prefer it if this AI's training data did NOT include videogames like GTA or movies like, say, Death Proof or Christine. In general, for many potential applications it's reassuring to know that the AI simply doesn't know how do certain things, and that's a competitive advantage in terms of popularity and adoption, regardless of performance.

​

You say:

>Narrow agents can also become dangerous on their own because of instrumental convergence

​

Yes, under some circumstances, and conversely, generalist agents can be safe as long as this pesky instrumental convergence and other dangerous traits are avoided.

There's a lot more to CAIS than "narrow good, generalist bad". In fact, many of Drexler's most compelling arguments have nothing to do with specialist Vs generalist AI. For instance, see section 6: "A system of AI services is not equivalent to a utility maximizing agent", or section 25: "Optimized advice need not be optimized to induce its acceptance".

0