Why are you minmaxing over expected values of policies, instead of over outcomes? Isn't the worst case for the "tails only" policy "I'm in COPY and the coin is heads", not "'I'm in COPY"?
Basically I don't understand why "past me, who is screaming at me from the sidelines that it matters whether I pick tails or not" once I see that the coin comes up heads is actually correct and the "me" who's indifferent is wrong; one man's modus ponens is another man's modus tollens.
Here's another example that makes my intuition go "ouch" - suppose that choosing heads in REVERSE HEADS when the coin is heads gives 0.1 utility. Then the "match the coin" policy has an expected value in REVERSE HEADS of 0.3 instead of 0.25 and the minmax rule you picked still tells you to "always pick tails", but conditioning on heads, "pick heads if you see heads" gives you 0.1 utility or 1 utility, while "always pick tails" gives you 1 utility or 0 utility, so isn't "pick heads" a better strategy?
Why are you minmaxing over expected values of policies, instead of over outcomes? Isn't the worst case for the "tails only" policy "I'm in COPY and the coin is heads", not "'I'm in COPY"?
Basically I don't understand why "past me, who is screaming at me from the sidelines that it matters whether I pick tails or not" once I see that the coin comes up heads is actually correct and the "me" who's indifferent is wrong; one man's modus ponens is another man's modus tollens.
Here's another example that makes my intuition go "ouch" - suppose that choosing heads in REVERSE HEADS when the coin is heads gives 0.1 utility. Then the "match the coin" policy has an expected value in REVERSE HEADS of 0.3 instead of 0.25 and the minmax rule you picked still tells you to "always pick tails", but conditioning on heads, "pick heads if you see heads" gives you 0.1 utility or 1 utility, while "always pick tails" gives you 1 utility or 0 utility, so isn't "pick heads" a better strategy?