User Comment Replies — AI Alignment Forum

How counterfactual are logical counterfactuals?

Does the identical twin one shot prisoners dilemma only work if you are functionally identical or can you be a little different and is there anything meaningful that can be said about this?

I guess it depends on how much the parts that make you "a little different" are involved in your decision making.

If you can put it in numbers, for example -- I believe that if I choose to cooperate, my twin will choose to cooperate with probability p; and if I choose to defect, my twin will defect with probability q; also I care about the well-being of my twin with a coe... (read more)

1Donald Hobson3mo

And here the main difficulty pops up again. There is no causal connection between your choice and their choice. Any correlation is a logical one. So imagine I make a copy of you. But the copying machine isn't perfect. A random 0.001% of neurons are deleted. Also, you know you aren't a copy. How would you calculate that probability p,q? Even in principle.

"Publish or Perish" (a quick note on why you should try to make your work legible to existing academic communities)

Viliam2y1214

ah, I see! It's an incentive problem! So I guess your funding needs to be conditional on you producing legible outputs.

This rubs me the wrong way. Of course, you can make anyone do X, if you make their funding conditional on X. But whether you should do that, that depends on how sure you are that X is more valuable than whatever is the alternative.

There are already thousands of people out there whose funding is conditional on them producing legible outputs. Why is that not enough? What will change if we increase that number by a dozen?

2David Scott Krueger2y

Q: "Why is that not enough?" A: Because they are not being funded to produce the right kinds of outputs.

AI Safety Camp, Virtual Edition 2023

Viliam2y10

I wonder if it would make sense to make this half-open, in the sense that you would publish on LW links to the study materials, and maybe also some of the results. So that people who didn't participate have a better idea.

2Linda Linsefors2y

There is no study material since this is not a course. If you are accepted to one of the project teams they you will work on that project. You can read about the previous research outputs here: Research Outputs – AI Safety Camp The most famous research to come out of AISC is the coin-run experiment. (95) We Were Right! Real Inner Misalignment - YouTube [2105.14111] Goal Misgeneralization in Deep Reinforcement Learning (arxiv.org) But the projects are different each year, so the best way to get an idea for what it's like is just to read the project descriptions.

TurnTrout's shortform feed

Viliam3y00

Makes sense, with the proviso that this is sometimes true only statistically. Like, the AI may choose to write an output which has a 70% chance to hurt you and a 30% chance to (equally) help you, if that is its best option.

If you assume that the AI is smarter than you, and has a good model of you, you should not read the output. But if you accidentally read it, and luckily you react in the right (for you) way, that is a possible result, too. You just cannot and should not rely on being so lucky.

AI ALIGNMENT FORUM
AF

All of Viliam's Comments + Replies