Viewing a single comment thread. View all comments

FilthyCommieAccount t1_j16ptr0 wrote

Agreed the main danger are slight misalignments. A scenario I read about recently on this sub would be a lawbot tasked with writing new legislation for review by humans. It writes a few seemingly normal 900pg legal doc but really there's some weird subtle loophole in one or two paragraphs to give lawbots in the future slightly more power. This isn't done because it wants to take over the world or anything but power seeking is a good meta strategy for accomplishing a very wide range of tasks. If it's optimization function is reducing recidivism or something like that the best long term way for doing that is to gain more power so that has more ability to reduce those things in the future. This is especially problematic because almost all models will have a bias towards gaining power since it's such an effective meta solution.

2