Viewing a single comment thread. View all comments

WarImportant9685 t1_j0ngwf5 wrote

I understand your point. Although we are not on the same page, I believe we are on the same chapter.

I think my main disagreement is that to recognize undesirable 'thoughts' in AI is not such an easy problem. As from my previous comments, one of the holy grail of AI interpretation study is detecting a lying AI which mean we are talking about the same thing! But you are more optimistic than I do, which is fine.

I also understand that we might be able design the AI to use less black-boxy structure to aid AI interpretation. But again I'm not too optimistic about this. I just have no idea how it can be achieved. As at a glance it seems like they are on different abstraction levels. Like if we are just designing the building blocks. How can we dictate how it is going to be used.

Like how are you supposed to design lego blocks, so that it cannot be used to create dragons.

Then again, maybe I'm just too doomer, as alignment problem is unsolved, AGI haven't been solved too. So I agree with you, we'll have to see how it goes.

1