Toy model piece #3: close and distant situations

Stuart_Armstrong

I'll build on my model of the previous post, and consider close and distant situation.

Recall that each world in is defined by $0 \leq n \leq 100$ , which is the number of people smiling in it. I'll extend the model by adding another Boolean variable $b$ , which determines, say, whether the world contains chocolate ( $b = 0$ ) or art ( $b = 1$ ). So worlds can be described by the pair $(n, b)$ .

The default situation - the one if the AI does nothing - is $(45, 1)$ , say. So $45$ smiling people in a world of art.

Then let's introduce two new partial preferences/pre-orders:

$P_{3} = {(n, 1) \leq (m, 1) ∣ 40 \leq n \leq m \leq 50}$ .
$P_{4} = {(n, b) \leq (m, 1) ∣ 40 \leq m \leq 50}$ .

So $P_{3}$ say that, within a range of the default world (art, and the number of smiling people not being within $5$ people of $45$ ), the more smiling people, the better. While $P_{4}$ says that worlds in this range are better than worlds outside them.

These result in the following utility functions:

$U_{3} (n, 1) = 2 n - 90$ if $40 \leq n \leq 50$ , $U_{3} (n, b) = 0$ otherwise.
$U_{4} (n, 1) = 1$ if $40 \leq n \leq 50$ , $U_{4} (n, b) = - 1$ otherwise.

After normalisation, these become:

${ˆ U}_{3} (n, 1) = n / 5 - 9$ if $40 \leq n \leq 50$ , $U_{3} (n, b) = 0$ otherwise.
${ˆ U}_{4} (n, 1) = 1$ if $40 \leq n \leq 50$ , $U_{4} (n, b) = - 11 / 191$ otherwise.

Again, I've felt free to translate ${ˆ U}_{4}$ to improve the clarity of the normalised version.

If we plot ${ˆ U}_{3}$ , we get the following:

Here I've slightly offset the $b = 1$ (purple) from the $b = 0$ (blue) worlds, for clarity of exposition, though they would of course be on top of each other.

Note that ${ˆ U}_{3}$ does not in itself distant situations, as this post recommends doing. Five close worlds are ranked about the distant worlds; but, conversely, five are close worlds are ranked below them.

To avoid distant situations, we need to add in ${ˆ U}_{4}$ , which explicitly punishes distant worlds, and hence plot ${ˆ U}_{3} + {ˆ U}_{4}$ :

This is much more like it! All the close worlds are now all ranked above the more distant ones.

But this is a close-run thing: the difference between the worse close world and the distant worlds is small. So, in general, when penalising distant worlds, we have to do this with some care, and maybe use a different standard of normalisation (since the downgrading of distant worlds is not really a preference of ours, rather a meta-tool of the synthesis method).