User Comment Replies — AI Alignment Forum

I have a question about the conjecture at the end of Direction 17.5. Let $U_{1}$ be a utility function with values in $[0, 1]$ and let $f : [0, 1] \to [0, 1]$ be a strictly monotonous function. Then $U_{1}$ and $U_{2} = f \circ U_{1}$ have the same maxima. $f$ can be non-linear, e.g. $f (x) = x^{2}$ . Therefore, I wonder if the condition $u (y) = α v (y) + β$ should be weaker.

Moreover, I ask myself if it is possible to modify $U_{1}$ by a small amount at a place far away from the optimal policy such that $π$ is still optimal fo... (read more)

The Learning-Theoretic Agenda: Status 2023

Frank_R2y20

3Vanessa Kosoy2y

No, because it changes the expected value of the utility function under various distributions. Good catch, the conjecture as stated is obviously false. Because, we can e.g. take U2 to be the same as U1 everywhere except after some action which π∗ doesn't actually take, in which case make it identically 0. Some possible ways to fix it: * Require the utility function to be of the form U:Oω→[0,1] (i.e. not depend on actions). * Use (strictly) instrumental reward functions. * Weaken the conclusion so that we're only comparing U1 and U2 on-policy (but this might be insufficient for superimitation). * Require π∗ to be optimal off-policy (but it's unclear how can this generalize to finite g).

AI ALIGNMENT FORUM
AF

All of Frank_R's Comments + Replies