User Comment Replies — AI Alignment Forum

AI will change the world, but won’t take it over by playing “3-dimensional chess”.

Thank you! I think that what we see right now is that as the horizon grows, the more "tricks" we need to make end-to-end learning works, to the extent that it might not really be end to end. So while supervised learning is very successful, and seems to be quite robust to choice of architecture, loss functions, etc., in RL we need to be much more careful, and often things won't work "out of the box" in a purely end to end fashion.

I think the question would be how performance scales with horizon, if the returns are rapidly diminishing, and the cost to ... (read more)

AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak2y30

Hi Vanesssa,

Perhaps given my short-term preference, it's not surprising that I find it hard to track very deep comment threads, but let me just give a couple of short responses.

I don't think the argument on hacking relied on the ability to formally verify systems. Formally verified systems could potentially skew the balance of power to the defender side, but even if they don't exist, I don't think balance is completely skewed to the attacker. You could imagine that, like today, there is a "cat and mouse" game, where both attackers and defenders try to find... (read more)

4Vanessa Kosoy2y

My point was not about the defender/attacker balance. My point was that even short-term goals can be difficult to specify, which undermines the notion that we can easily empower ourselves by short-term AI. Sort of. The correct way to make it more rigorous, IMO, is using tools from algorithmic information theory, like I suggested here.

AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak2y80

Hi Vanessa,

Let me try to respond (note the claim numbers below are not the same as in the essay, but rather as in Vanessa's comment):

Claim 1: Our claim is that one can separate out components - there is the predictable component which is non stationary, but is best approximated with a relatively simple baseline, and the chaotic component, which over the long run is just noise.In general, highly complex rules are more sensitive to noise (in fact, there are theorems along these lines in the field of Analysis of Boolean Functions), and so in the long run, the... (read more)

4Vanessa Kosoy2y

Thanks for the responses Boaz! I will look into analysis of boolean functions, thank you. However, unless you want to make your claim more rigorous, it seems suspect to me. In reality, there are processes happening simultaneously on many different timescales, from the microscopic to the cosmological. And, these processes are coupled, so that the current equilibrium of each process can be regarded as a control signal for the higher timescale processes. This means we can do long-term planning by starting from the long timescales and back-chaining to short timescales, like I began to formalize here. So, while eventually the entire universe reaches an equilibrium state (a.k.a. heat-death), there is plenty of room for long-term planning before that. Yeeees, it does seem like hacking is an especially bad example. But even in this example, my position is quite defensible. Yes, theoretically you can formally specify the desired behavior of the code and verify that it always happens. But, there are two problems with that: First, for many realistic software system, the formal specification would require colossal effort. Second, the formal verification is only as good as the formal model. For example, if the attacker found a hardware exploit, while your model assumes idealized behavior for the hardware, the verification doesn't help. And, it domains outside software the situation is much worse: how do you "verify" that your biological security measures are fool-proof, for example? When you're selecting for success on a short-term goal you might inadvertently produce a long-term agent (which, on the training distribution, is viewing the short-term goal as instrumental for its own goals), just like how evolution was selecting for genetic fitness but ended up producing agents with many preferences unrelated to that. More speculatively, there might be systematic reasons for such agents to arise, for example if good performance in the real-world requires physicalist epistemolo

AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak2y20

It is indeed the case that sometimes we see phase transitions / discontinuous improvements, and this is an area which I am very interested in. Note however that (while not in our paper) typically in graphs such as BIG-Bench, the X axis is something like log number of parameters. So it does seem you pay quite a price to achieve improvement.

The claim there is not so much about the shape of the laws but rather about potential (though as you say, not certain at all) limitations as to what improvements you can achieve through pure software alone, wi... (read more)

2Lawrence Chan2y

Yeah, I agree that a lot of the “phase transitions” look more discontinuous than they actually are due to the log on the x axis — the OG grokking paper definitely commits this sin, for example. (I think there’s also another disagreement here about how close humans are to this natural limit.)

AI ALIGNMENT FORUM
AF

All of boazbarak's Comments + Replies