ah, I see! It's an incentive problem! So I guess your funding needs to be conditional on you producing legible outputs.
This rubs me the wrong way. Of course, you can make anyone do X, if you make their funding conditional on X. But whether you should do that, that depends on how sure you are that X is more valuable than whatever is the alternative.
There are already thousands of people out there whose funding is conditional on them producing legible outputs. Why is that not enough? What will change if we increase that number by a dozen?
I wonder if it would make sense to make this half-open, in the sense that you would publish on LW links to the study materials, and maybe also some of the results. So that people who didn't participate have a better idea.
Makes sense, with the proviso that this is sometimes true only statistically. Like, the AI may choose to write an output which has a 70% chance to hurt you and a 30% chance to (equally) help you, if that is its best option.
If you assume that the AI is smarter than you, and has a good model of you, you should not read the output. But if you accidentally read it, and luckily you react in the right (for you) way, that is a possible result, too. You just cannot and should not rely on being so lucky.
I guess it depends on how much the parts that make you "a little different" are involved in your decision making.
If you can put it in numbers, for example -- I believe that if I choose to cooperate, my twin will choose to cooperate with probability p; and if I choose to defect, my twin will defect with probability q; also I care about the well-being of my twin with a coefficient e, and my twin cares about my well-being with a coefficient f -- then you could take the payout matrix and these numbers, and calculate the correct strategy.
Option one, what if you cooperate. You multiply your payout, which is C-C with probability p, and C-D with probability 1-p; and also your twin's payout, which is C-C with probability p, and D-C with probability 1-p; then you multiply your twin's payout by your empathy e, and add that to your payout, etc. Okay, this is option one, now do the same for options two; and then compare the numbers.
It gets way more complicated when you cannot make a straightforward estimate of the probabilities, because the algorithms are too complicated. Could be even impossible to find a fully general solution (because of the halting problem).