Frank_R

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

The Learning-Theoretic Agenda: Status 2023

I have a question about the conjecture at the end of Direction 17.5. Let be a utility function with values in $[0, 1]$ and let $f : [0, 1] \to [0, 1]$ be a strictly monotonous function. Then $U_{1}$ and $U_{2} = f \circ U_{1}$ have the same maxima. $f$ can be non-linear, e.g. $f (x) = x^{2}$ . Therefore, I wonder if the condition $u (y) = α v (y) + β$ should be weaker.

Moreover, I ask myself if it is possible to modify $U_{1}$ by a small amount at a place far away from the optimal policy such that $π$ is still optimal for the modified utility function. This would weaken the statement about the uniqueness of the utility function even more. Think of an AI playing Go. If a weird position on the board has the utility -1.01 instead of -1, this should not change the winning strategy. I have to go through all of the definitions to see if I can actually produce a more mathematical example. Nevertheless, you may have a quick opinion if this could happen.

Reply