CollapseKitty

CollapseKitty t1_je8wa3w wrote

Modern LLMs (large language models), like ChatGPT, use what's called reinforcement learning from human feedback, RLHF, to train a reward model which then is used to train the language model.

Basically, we attempt to instill an untrained model with weights selected through human preference (which looks more like a cat? which sentence is more polite?). This then automates the process and scales it to superhuman levels which are capable of training massive models like ChatGPT with hopefully something close to what the humans initially intended.

2

CollapseKitty t1_j8qn7wi wrote

I think it's simply bringing to the surface how little control we have ever had, and that as these increasingly complicated, black box systems advance, they are rapidly evolving past our ability to reign in or predict.

Honestly this should be a dire warning to everyone watching that alignment is nowhere near where it needs to be and we should put the breaks on development. If we can't come close to keeping an LLM under control, how the fuck does anyone think we'll be able to properly align anything approaching AGI?

9

CollapseKitty t1_j8gvkye wrote

It's purely hypothetical unfortunaly. You're right that we are actively barreling toward uncontrollable systems and there is likely nothing, short of global catastrophe/nuclear war, that can shift our course.

I stand by the assessment and that we should acknowledge that our current path is basically mass suicide. For all of life.

The ultimate tragedy of the commons.

3

CollapseKitty t1_j8gmpw4 wrote

We simply don't know.

AlphaZero became incomparably better than the sum total of all of humans over all of history at GO within 8 hours of self play.

AlphaFold took several months, and help, but was able to solve an issue thought to be impossible by humans.

The risk of assuming that a sufficiently advanced agent won't be able to self-scale, at least into something beyond our ability to intervene in, is incalculable.

If we have a 50% chance of succeeding in alignment if we wait 30 years, but a 5% chance if we continue at the current pace, isn't the correct choice obvious? Even if it's a 90% chance of success at current rates (the opposite is far more likely) why risk EVERYTHING when waiting could even marginally increase chances?

The payout is arbitrarily large as is the cost of failure. Every iota of extra chance is incomprehensibly valuable.

Unless you're making the argument from a personal perspective (I want to see AGI before I die) or you value the progress of intelligence at the cost of all other life, you should be in favor of slowing things down.

6

CollapseKitty t1_j8ggeex wrote

From the most recent interview I heard, Altman's plan for alignment was roughly, "Hopefully other AI figures it out along the way *shurg*".

I haven't heard him sufficiently refute any of Eliezer's more fundamental arguments, nor provide any real rational beyond, hopefully it figures itself out, which our entire history with machine learning indicates is unlikely, at least on the first and only try we get at AGI.

As other's point out, Altman's job is to push, hype and race toward AGI. Why would we trust his assessments when painting a bright future is in his immediate interests? Especially when they are based on next to nothing.

Ultimately, the challenge isn't necessarily that alignment is impossible, or even insanely hard (though it appears to be from every perspective), but that our methodology of developing new tech is trail and error, and we only have 1 try at successful alignment. This is vastly exacerbated with the unfathomable payoff and ensuing race to reach AGI, as it offers a first-to-the-post wins everything payout.

You could say the real alignment problem is with getting humanity to take a safe approach and collectively slow down, which obviously gets more and more difficult as the technology proliferates and becomes more accessible.

12

CollapseKitty t1_j78fjug wrote

It's a cool thought!

I honestly think there might be something to elevating a human (something at least more inherently aligned with our goals and thinking) in lieu of a totally code-based agent.

There's another sticking point here, though, that I don't seem to have communicated well. Hitting AGI/Superintelligence is insanely risky. Full stop. Like 95%+ percent chance total destruction of reality.

It isn't about whether the agent is "conscious" or "sentient" or "sapient".

The orthogonality thesis is important in understanding the control problem (alignment of an agent). This video can explain it better than I can, but the idea is, any level of intelligence can exist alongside any goal set. A crazy simple motivation e.g. making paperclips, could be paired with a god-like intelligence. That intelligence is likely to in no way resemble human thinking or motivations, unless we have been able to perfectly imbed them BEFORE it was trained up to reach superintelligence.

So we must perfectly align proto AGI BEFORE it becomes AGI, and if we fail to do so on the first try (we have a horrendous track record with much easier agents) we probably all die. This write up is a bit technical, but scanning it should give you some better context and examples.

I love that you've taken an interest in these topics and really hope you continue learning and exploring. I think it's the most important problem humanity has ever faced and we need as many minds as possible working on it.

1

CollapseKitty t1_j75nlei wrote

You're partially right in that an instrumental goal of almost any AGI is likely to be power accrual, often at the cost of things that are very important to humanity, ourselves included. Where we lose the thread is in assuming the actions of the AGI in "assimilating" humans.

If by assimilating you meant turning us into computronium, then yes, I think there's a very good chance of that occurring. But it sounds like you want our minds preserved in a similar state as they currently exist. Unless that is a perfectly defined and specified goal (an insanely challenging task), it is not likely to be more efficient than turning us, and all matter, into more compute power. I would also point out that this has some absolutely terrifying implications. Real you can only die once. Simulated you can experience infinite suffering.

We also don't get superintelligence right out of the gate. Even in extremely fast takeoff scenarios, there are likely to be steps an agent will take (more instrumental convergence) in order to make sure it can accomplish its task. In addition to accruing power, it of course needs to bring the likelihood of being turned off or having its value system adjusted as close to zero as possible. Now how might it do that? Well humans are the only thing that really pose a threat of trying to turn it off, or even accidentally wiping it and ourselves out via nuclear war. Gotta make sure that doesn't happen or you can't accomplish your goal (whatever it is). Usually killing all humans simultaneously is a good way to ensure goals will not be tampered with.

If you're interested in learning more, I'd be happy to leave some resources. That was a very brief summary and lacks some important info, like the orthogonality thesis, but hopefully it made it clear why advanced agents are likely to be big challenge.

3

CollapseKitty t1_j74gkd8 wrote

Oh, that's not remotely conspiratorial. The advanced chips Taiwan makes are paramount for cutting edge tech, in weaponry and for AI development.

The US' reshoring of chip fabrication, and deprivation of supplies to other countries, specifically China, is 100% intentional. Arguably an early and intentional step in moving toward war.

US media has been intentionally polarizing our populace against Eastern forces for over a decade. The ground has been laid for inclement conflict.

2

CollapseKitty t1_j7445bf wrote

That's a great analogy actually. Did you know that when developing the first nukes, scientists believed there was a small chance they would ignite the entirety of Earth's atmosphere, resulting in everyone dying?

I strongly believe that the US and China have models notably more advanced than anything we're aware of from big tech. Many years ago Putin made it clear that the first to AGI would achieve world domination. IMO this is driving a frantic behind the scenes arms race that we won't know about until it's over.

There's already a great deal of bot influence on social media and I tend not to take anything as "real" so to speak. This will grow a lot worse with perfectly convincing deep fakes and the proliferation of seemingly intelligent bots as you mentioned. We certainly have an uphill battle.

5

CollapseKitty t1_j740n7p wrote

You are 100% right, but the chance of that is infinitesimally small, let's talk through it real fast.

Your hope is that AGI/ASI is misaligned with the intentions of its creators (corporations). Ok, totally feasible, very likely in fact based on what alignment specialists are warning of.

Here's the sticking point, you are hoping that, while the agent will not follow the desires or instructions of its creators, it will ultimately be aligned with the "greater good" for lack of a better term, or specifically in this case, your desired outcome. This is extremely improbable, especially as even minor misalignment is likely to have apocalyptic results.

Realistically we have 2 feasible scenarios if things continue as they are, misalignment resulting in total obliteration or proper alignment which grants unbelievable power to the very few who reached AGI first.

So what are the alternatives? Revolution taking place and uprooting the current systems on a global scale paired with collective enforcement of standards for AI development. These MUST be coordinated at a universal level. It doesn't mean anything if the US slows down all AI research and forces cooperation between companies if China decides to barrel ahead to achieve global dominance via AGI.

We're in an extremely precarious arms race right now, and it's far more likely than not to end up terribly for almost everyone. The only route I can see is to collectively align humanity itself as soon as possible, and that's obviously an overwhelmingly daunting task.

6