I ended up writing a long rant about agency in my review of Joe Carlsmith’s report on x-risk from power-seeking AI. I’ve polished the rant a bit and posted it into this sequence. The central analogy between APS-AI and self-propelled machines (“Auto-mobiles”) is a fun one, and I suspect the analogy runs much deeper than I’ve explored so far.

For context, the question being discussed is whether we should “expect incentives to push relevant actors to build agentic planning and strategically aware systems [APS systems] in particular, once doing is possible and financially feasible.”

Joe says 80% yes, 20% no:

“The 20% on false, here, comes centrally from the possibility that the combination of agentic planning and strategic awareness isn’t actually that useful or necessary for many tasks -- including tasks that intuitively seem like they would require it (I’m wary, here, of relying too heavily on my “of course task X requires Y” intuitions). For example, perhaps such tasks will mostly be performed using collections of modular/highly specialized systems that don’t together constitute an APS system; and/or using neural networks that aren’t, in the predictively relevant sense sketched in 2.1.2-3, agentic planning and strategically aware. (To be clear: I expect non-APS systems to play a key role in the economy regardless; in the scenarios where (2) is false, though, they’re basically the only game in town.)”

I agree that “of course X requires Y” intuitions have been wrong in the past and also that evidence from how nature solved the problem in humans and nonhuman animals will not necessarily generalize to artificial intelligence. However:

  1. Beware isolated demands for rigor. Imagine someone in 1950 saying “Some people thought battleships would beat carriers. Others thought that the entire war would be won from the air. Predicting the future is hard; we shouldn’t be confident. Therefore, we shouldn’t assign more than 90% credence to the claim that powerful, portable computers (assuming we figure out how to build them) will be militarily useful, e.g. in weapon guidance systems or submarine sensor suites. Maybe it’ll turn out that it’s cheaper and more effective to just use humans, or bio-engineered dogs, or whatever. Or maybe there’ll be anti-computer weapons that render them useless. Who knows. The future is hard to predict.” This is what Joe-with-skeptic-hat-on sounds like to me. Battleships vs. carriers was a relatively hard prediction problem; whether computers would be militarily useful was an easy one. I claim it is obvious that APS systems will be powerful and useful for some important niches, just like how it was obvious in 1950 that computers would have at least a few important military applications.
  2. To drive this point home, let me take the reasons Joe gave for skepticism and line-by-line mimic them with a historical analogy to self-propelled machines, i.e. “automotives.” Left-hand column is entirely quotes from the report.
Skeptic about APS systemsSkeptic about self-propelled machines
Many tasks -- for example, translating languages, classifying proteins, predicting human responses, and so forth -- don’t seem to require agentic planning and strategic awareness, at least at current levels of performance. Perhaps all or most of the tasks involved in automating advanced capabilities will be this way.Many tasks — for example, raising and lowering people within a building, or transporting slag from the mine to the deposit, or transporting water from the source to the home — don’t seem to require automotives. Instead, an engine can be fixed in one location to power a system of pulleys or conveyor belts, or pump liquid through pipes. Perhaps all or most of the tasks involved in automating our transportation system will be this way.
In many contexts (for example, factory workers), there are benefits to specialization; and highly specialized systems may have less need for agentic planning and strategic awareness (though there’s still a question of the planning and strategic awareness that specialized systems in combination might exhibit).In many contexts, there are benefits to specialization. An engine which is fixed to one place (a) does not waste energy moving its own bulk around, (b) can be specialized in power output, duration, etc. to the task at hand, (c) need not be designed with weight as a constraint, and thus can have more reliability and power at less expense.
Current AI systems are, I think, some combination of non-agentic-planning and strategically unaware. Some of this is clearly a function of what we are currently able to build, but it may also be a clue as to what type of systems will be most economically important in future.

Current engines are not automotives. Some of this is clearly a function of what we are currently able to build (our steam engines are too heavy and weak to move themselves) but it may also be a clue as to what type of systems will be most economically important in the future.


 

To the extent that agentic planning and strategic awareness create risks of the type I discuss below, this might incentivize focus on other types of systems.To the extent that self-propelled machines may create risks of “crashes,” this might incentivize focus on other types of systems (and I would add that a fixed-in-place engine seems inherently safer than a careening monstrosity of iron and coal!) To the extent that self-propelled machines may enable some countries to invade other countries more easily, e.g. by letting them mobilize their armies and deploy to the border within days by riding “trains,” and perhaps even to cross trench lines with bulletproof “tanks,” this threat to world peace and the delicate balance of power that maintains it might incentivise focus on other types of transportation systems. [Historical note: The existence of trains was one of the contributing causes of World War One. See e.g. Railways and the mobilisation for war in 1914 | The National Archives.]

Plan-based agency and strategic awareness may constitute or correlate with properties that ground moral concern for the AI system itself (though not all actors will treat concerns about the moral status of AI systems with equal weight; and considerations of this type could be ignored on a widespread scale).


 

OK, I admit that it is much more plausible that people will care for the welfare of APS-AI than for the welfare of cars/trains. However I don’t think this matters very much so I won’t linger on this point.


 

3. There are plenty of cases where human “of course task X requires Y” intuitions turned out to be basically correct. (e.g. self-driving cars need to be able to pathfind and recognize images, image-recognizers have circuits that seem to be doing line detection, tree search works great for board game AIs, automating warehouses turned out to involve robots that move around rather than a system of conveyor belts, automating cruise missiles turned out to not involve having humans in the loop steering them… I could go on like this forever. I’m deliberately picking “hard cases” where a smart skeptic could plausibly have persuaded the author to doubt their intuitions that X requires Y, as opposed to cases where such a skeptic would have been laughed out of the room.)

4. There’s a selection effect that biases us towards thinking our intuition about these things is worse than it is:

  • Cases where our intuition about is incorrect are cases where it turns out there is an easier way, a shortcut. For example, chess AI just doing loads of really fast tree search instead of the more flexible, open-ended strategic reasoning some people maybe thought chess would require.
  • If the history of AIs-surpassing-humans-at-tasks looks like this:
  • Then we should expect the left tail to contain a disproportionate percentage of the cases where there is a shortcut. Cases where there is no shortcut will be clumped over on the right.

4. More important than all of the above: As Gwern pointed out, it sure does seem like some of the tasks some of us will want to automate are agency tasks, tasks such that anything which performs them is by definition an agent. Tasks like “gather data, use it to learn a general-purpose model of the world, use that to make a plan for how to achieve X, carry out the plan.”

5. Finally, and perhaps most importantly: We don’t have to go just on intuition and historical analogy. We have models of agency, planning, strategic awareness, etc. that tell us how it works and why it is so useful for so many things. [This sequence is my attempt to articulate my model.]

Many thanks to Joe Carlsmith for his excellent report and for conversing with me at length about it.

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 12:35 PM

Interesting post. Overall, though, it feels like you aren't taking hindsight bias seriously enough. E.g. as one example:

Some people thought battleships would beat carriers. Others thought that the entire war would be won from the air. Predicting the future is hard; we shouldn’t be confident. Therefore, we shouldn’t assign more than 90% credence to the claim that powerful, portable computers (assuming we figure out how to build them) will be militarily useful, e.g. in weapon guidance systems or submarine sensor suites.

In this particular case, an alternative is to have almost all of the computation done in a central location, with the results sent wherever they are needed. This is how computers worked for decades, and it's probably how computers will work in the future. So it seems overconfident for someone in 1950 to assign more than 90% credence to portability being crucial.

Also, the whole setup of picking something which we already know to be widespread (cars) and then applying Joe's arguments to it, seems like it's shouldn't tell us much. If Joe were saying 1% yes, 99% no for incentives to build APS systems, then the existence of counterexamples like cars which have similar "no" arguments would be compelling. But he's saying 80% yes, 20% no, and so the fact that there are some cases where his "no" arguments fail is unsurprising - according to him, "no" arguments of this strength should fail approximately 80% of the time. 

There’s a selection effect that biases us towards thinking our intuition about these things is worse than it is

This is an interesting argument, but it applies most strongly to cases where we mainly became interested in the problem after it started looking tractable (e.g. AI for art, AI for Go). In other cases, such as chess or Turing tests, people thought of these as being of central importance well before we had any feasible way to approach them. So if they tend to have much quicker shortcuts than expected, that's good evidence that our intuitions are bad.

Thanks! Ah, I shouldn't have put the word "portable" in there then. I meant to be talking about computers in general, not computers-on-missiles-as-opposed-to-ground-installations.

Also, the whole setup of picking something which we already know to be widespread (cars) and then applying Joe's arguments to it, seems like it's shouldn't tell us much. If Joe were saying 1% yes, 99% no for incentives to build APS systems, then the existence of counterexamples like cars which have similar "no" arguments would be compelling. But he's saying 80% yes, 20% no, and so the fact that there are some cases where his "no" arguments fail is unsurprising - according to him, "no" arguments of this strength should fail approximately 80% of the time. 

I think I agree with this, and should edit to clarify that that's not the argument I'm making... what I'm saying is that sometimes, "Of course X requires Y" and "of course Y will be useful/incentivised" predictions can be made in advance, with more than 90% confidence. I think "computers will be militarily useful" and "self-propelled vehicles will be useful" are two examples of this. The intuition I'm trying to pump is not "Look at this case where X was useful, therefore APS-AI will be useful" but rather "look at this case where it would have been reasonable for someone to be more than 90% confident that X was useful despite the presence of Joe's arguments; therefore, we should be open to the possibility that we should be more than 90% confident that APS-AI will be useful despite Joe's arguments." Of course I still have work to do, to actually provide positive arguments that our credence should be higher than 90%... the point of my analogy was defensive, to defend against Joe's argument that because of such-and-such considerations we should have 20% credence on Not Useful/Incentivised.

(Will think more about what you said and reply later)

I guess I just don't feel like you've established that it would have been reasonable to have credence above 90% in either of those cases. Like, it sure seems obvious to me that computers and automobiles are super useful. But I have a huge amount of evidence now about both of those things that I can't really un-condition on. So, given that I know how powerful hindsight bias can be, it feels like I'd need to really dig into the details of possible alternatives before I got much above 90% based on facts that were known back then.

(Although this depends on how we're operationalising the claims. If the claim is just that there's something useful which can be done with computers - sure, but that's much less interesting. There's also something useful that can be done with quantum computers, and yet it seems pretty plausible that they remain niche and relatively uninteresting.)

Fair. If you don't share my intuition that people in 1950 should have had more than 90% credence that computers would be militarily useful, or that people at the dawn of steam engines should have predicted that automobiles would be useful (conditional on them being buildable) then that part of my argument has no force on you.

Maybe instead of picking examples from the past, I should pick an example of a future technology that everyone agrees is 90%+ likely to be super useful if developed, even though Joe's skeptical arguments can still be made.

Musings on ways in which the analogy is deep, or at least seems deep to me right now: (It's entirely possible my pattern-recognizer is overclocking and misfiring here... this is fun to think about though)

Automobiles move the cargo to the target. To do this, they move themselves. Agents improve the world according to some evaluation function. To do this, they improve their own power/capability/position.

You often don't know what's inside an automobile until the thing reaches its destination and gets unloaded. Similarly, you often don't know what's inside an agent until it achieves a large amount of power/capability/position and starts optimizing for final goals instead of instrumental goals.

Automobiles are useful because they are general-purpose; you can invest a ton of money and R&D in a factory that makes automobiles, and comparatively little money in roads or rails, and then you have a transportation solution that works for almost everything, even though for most particular things you would have been better off building a high-speed conveyor belt or something. Agents are useful because they are general-purpose; you can invest a ton of money and R&D to make an agent AGI or APS-AI, and then you can make copies of it to do almost any task, even though for most particular tasks you would have been better off building a specialized tool AI.

EDIT: Automobiles tend to be somewhat modular, with a part that holds cargo distinct from the parts that move the whole thing. Agents--at least, the ones we design--also tend to be modular in the same way, with a "utility function" or "value network" distinct from the "beliefs" and "Decision theory" or "MCTS code." It's less clear whether this is true of evolved automobiles though, or evolved agents. Maybe it's mostly just true of intelligently designed agents/automobiles.