8 Preferences and biases, the information argument

23rd Mar 2021

1 min read

8

I've recently thought of a possibly simpler way of expressing the argument from the Occam's razor paper. Namely:

Human biases and human preferences contain more combined information than human behaviour does. And more than the full human policy does.

Thus, in order to deduce human biases and preferences, we need more information than the human policy caries.

This extra information is contained in the "normative assumptions": the assumptions we need to add, so that an AI can learn human preferences from human behaviour.

We'd ideally want to do this with as few extra assumptions as possible. If the AI is well-grounded and understands what human concepts mean, we might be able to get away with a simple reference: "look through this collection of psychology research and take it as roughly true" could be enough assumptions to point the AI to all the assumptions it would need.

AIRationality

Frontpage

Mentioned in

6Toy model of preference, bias, and extra information

Preferences and biases, the information argument

1Charlie Steiner

2Stuart_Armstrong

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 6:18 PM

[-]Charlie Steiner5y10

But is that true? Human behavior has a lot of information. We normally say that this extra information is irrelevant to the human's beliefs and preferences (i.e. the agential model of humans is a simplification), but it's still there.

[-]Stuart_Armstrong5y20

Look at the paper linked for more details ( https://arxiv.org/abs/1712.05812 ).

Basically "humans are always fully rational and always take the action they want to" is a full explanation of all of human behaviour, that is strictly simpler than any explanation which includes human biases and bounded rationality.

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

8

Preferences and biases, the information argument

8