User Comment Replies — AI Alignment Forum

We do not know, that is the relevant problem.

Looking at the output of a black box is insufficient. You can only know by putting the black box in power, or by deeply understanding it.
Humans are born into a world with others in power, so we know that most humans care about each other without knowing why.
AI has no history of demonstrating friendliness in the only circumstances where that can be provably found. We can only know in advance by way of thorough understanding.

A strong theory about AI internals should come first. Refuting Yudkowsky's theory about how it might go wrong is irrelevant.

AI as a science, and three obstacles to alignment strategies

Jono

1y6-2

Alex Turner

1y5-2

Well, if someone originally started worrying based on strident predictions of sophisticated internal reasoning with goals independent of external behavior, then realizing that's currently unsubstantiated should cause them to down-update on AI risk. That's why it's relevant. Although I think we should have good theories of AI internals.

AI ALIGNMENT FORUM
AF

All of Jono's Comments + Replies