I have a question about the conjecture at the end of Direction 17.5. Let U1 be a utility function with values in [0,1] and let f:[0,1]→[0,1] be a strictly monotonous function. Then U1 and U2=f∘U1 have the same maxima. f can be non-linear, e.g. f(x)=x2. Therefore, I wonder if the condition u(y)=αv(y)+β should be weaker.
Moreover, I ask myself if it is possible to modify U1 by a small amount at a place far away from the optimal policy such that π is still optimal for the modified utility function. This would weaken the statement about the uniqueness of the utility function even more. Think of an AI playing Go. If a weird position on the board has the utility -1.01 instead of -1, this should not change the winning strategy. I have to go through all of the definitions to see if I can actually produce a more mathematical example. Nevertheless, you may have a quick opinion if this could happen.
I have a question about the conjecture at the end of Direction 17.5. Let U1 be a utility function with values in [0,1] and let f:[0,1]→[0,1] be a strictly monotonous function. Then U1 and U2=f∘U1 have the same maxima. f can be non-linear, e.g. f(x)=x2. Therefore, I wonder if the condition u(y)=αv(y)+β should be weaker.
Moreover, I ask myself if it is possible to modify U1 by a small amount at a place far away from the optimal policy such that π is still optimal for the modified utility function. This would weaken the statement about the uniqueness of the utility function even more. Think of an AI playing Go. If a weird position on the board has the utility -1.01 instead of -1, this should not change the winning strategy. I have to go through all of the definitions to see if I can actually produce a more mathematical example. Nevertheless, you may have a quick opinion if this could happen.