If you want to disallow appeals to authority
I do, but more importantly, I want to disallow the judge understanding all the concepts here. Suppose the judge says to #1: "What is energy?" or "What is conservation?" and it can't be explained to them - what then?
Also, argument 1 isn't actually correct, E=mc^2 and so on.
That seems right, but why is it a problem? The honest strategy is fine under cross-examination, it will give consistent answers across contexts.
"The honest strategy"? If you have that, you can just ask it and not bother with the debate. I...
You can recursively decompose the claim "perpetual motion machines are known to be impossible" until you get down to a claim like "such and such experiment should have such and such outcome", which the boss can then perform to determine a winner.
Ah, I don't think you can. Making that kind of abstract conclusion from a practical number of experiments requires abstractions like potential energy, entropy, Noether's theorem, etc - which in this example, the judge doesn't understand. (Without such abstractions, you'd need to consider every possible type of m...
To clarify the 2nd point, here's an example. Suppose someone presents you with a large box that supposedly produces electricity endlessly. Your boss thinks it works, and you're debating the inventor in front of your boss.
"Perpetual motion machines are known to be impossible" you say, but your boss isn't familiar with that conceptual class or the reasoning involved.
The inventor says, "Here, let's plug in a thing, we can see that the box does in fact produce a little electricity." Your boss finds this very convincing.
The process proposed in the paper is some...
I took a look at the debate papers. I think that's a good angle to take, but they're missing some factors that sometimes make debates between humans fail.
Humans and neural networks both have some implicit representation of probability distributions of output types. The basis behind "I can't explain why but that seems unlikely" can be more accurate than "here's an argument for why that will happen". You're basically delegating the problem of "making AI thinking explainable" to the AI itself, but if you could do that, you could just...make neural networks
Strongly agree on the first challenge; on the theory workstream we're thinking about how to deal with this problem. Some past work (not from us) is here and here.
Though to be clear, I don't think the empirical evidence clearly rules out "just making neural networks explainable". Imo, if you wanted to do that, you would do things in the style of debate and prover-verifier games. These ideas just haven't been tried very much yet. I don't think "asking an AI what another AI is doing and doing RLHF on the response" is nearly as good; that is much more likely t...
You don't? But this is a major problem in arguments between people. The variation within humans is already more than enough for this! There's a gap like that every 35 IQ points or so. I don't understand why you're confident this isn't an issue.
I guess we've found our main disagreement, at least?
... (read more)