User Comment Replies — AI Alignment Forum

How counterfactual are logical counterfactuals?

I get the impression that "has the agent's source code" is some Yudkowskyism which people use without thinking.

Every time someone says that, I always wonder "are you claiming that the agent that reads the source code is able to solve the Halting Problem?"

1Donald Hobson4mo

The Halting problem is a worst case result. Most agents aren't maximally ambiguous about whether or not they halt. And those that are, well then it depends what the rules are for agents that don't halt. There are set ups where each agent is using an nonphysically large but finite amount of compute. There was a paper I saw somewhere a while ago where both agents were doing a brute force proof search for the statement "if I cooperate, then they cooperate" and cooperating if they found a proof. (Ie searching all proofs containing <10^100 symbols)

Troll Bridge

Jiro6y00

How can you (in general) conclude something by examining the source code of an agent, without potentially implicating the Halting Problem?

2Abram Demski6y

In this case, we have (by assumption) an output of the program, so we just look at the cases where the program gives that output.

Gurkenglas6y70

Nothing stops the Halting problem being solved in particular instances. I can prove that some agent halts, and so can it. See FairBot in Robust Cooperation in the Prisoner's Dilemma.

[AN #56] Should ML researchers stop running experiments before making hypotheses?

Jiro6y00

It seems to me that if you expect that the results of your experiment can be useful in and generalized to other situations, then it has to be possible to replicate it. Or to put it another way, if the principle you discovered is useful for more than running the same program with a different seed, shouldn't it be possible to test it by some means other than running the same program with a different seed?

2Rohin Shah6y

Certainly. But even if the results are not useful and can't be generalized to other situations, it's probably possible to replicate it, in a way that's slightly different from running the same program with a different seed. (E.g. you could run the same algorithm on a different environment that was constructed to be the kind of environment that algorithm could solve.) So this wouldn't work as a test to distinguish between useful results and non-useful results. Relevant recent Andrew Gelman blog post

[AN #56] Should ML researchers stop running experiments before making hypotheses?

Jiro6y00

Instead of preregistering all experiments, maybe researchers could run experiments and observe results, formulate a theory, and then preregister an experiment that would test the theory—but in this case I would expect that researchers end up “preregistering” experiments that are very similar to the experiments that generated the theory, such that the results are very likely to come out in support of the theory.

Why would you expect this? Assuming you are not suggesting "what if the researchers lie and say they did the experiment again when they didn't",

... (read more)

4Rohin Shah6y

So I don't really like the terminology of "p-hacking" and "replication", and I'm going to use different terminology here that I find more precise and accurate; I don't know how much I need to explain so you should ask if any particular term is unclear. Indeed, because I don't think it's the relevant concept for ML research. I'm more concerned about the garden of forking paths. (I also strongly recommend Andrew Gelman's blog, which has shaped my opinions on this topic a lot.) Almost all ML experiments will replicate, for some definition of "replicate" -- if you run the same code, even with different random seeds, you will usually get the same results. (I would guess it is fairly rare for researchers to overfit to particular seeds, though it can happen.) In research involving humans, usually some experimental details will vary, just by accident, and that variation leads to a natural "robustness check" that gives us a tiny bit of information about how externally valid the result is. We might not get even that with ML. It's also worth noting that even outside of ML, what does or doesn't count as a "replication" varies widely, so I prefer not to use the term at all, and instead talk about how a separate experiment gives us information about the validity and generalization properties of a particular theory. Hopefully it's a bit clearer now. Given what I've said in this comment, I might now restate the sentence as "I expect that the preregistered experiments will not be sufficiently different from the experiments the researchers ran themselves. So, I expect the results to be of limited use in informing me about the external validity of the experiment and the generalization properties of their theory."

AI ALIGNMENT FORUM
AF

All of Jiro's Comments + Replies