Maybe useful: an analogy this post brought to mind for me: Replacing “AI” with “Animals”.
Hypothetical alien civilization, observing Early Earth and commenting on whether it poses a risk.
Doesn’t optimization nature produce non-agentic animals? It mostly does, but those aren’t the ones we’re concerned with. The risk is all concentrated in the agentic animals.
Basically every animal ever is not agentic. I’ve studied animals for my entire career and I haven’t found an agentic animal yet. That doesn’t preclude them showing up in the future. We have reasons to believe that not only are agents possible, but they are likely.
Even if agentic animals showed up, they would be vastly outnumbered by all the other animals. We believe that agency will give the agentic animals such a drastic advantage, that they will seem to take over the world in a very short amount of time.
(Etc etc)
(Possible that this is in one of the things you cite, and either I missed it or I am failing to remember it)
[ETA: I'm deprioritizing completing this sequence because it seems that other people are writing good similar stuff. In particular, see e.g. https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality-is-the-tiger-and-agents-are-its-teeth and https://www.lesswrong.com/posts/pdJQYxCy29d7qYZxG/agency-and-coherence ]
This sequence explains my take on agency. I’m responding to claims that the standard arguments for AI risk have a gap, a missing answer to the question “why should we expect there to be agenty AIs optimizing for stuff? Especially the sort of unbounded optimization that instrumentally converges to pursuit of money and power.”
This sequence is a pontoon bridge thrown across that gap.
I’m also responding to claims that there are coherent, plausible possible futures in which agent AGI (perhaps better described as APS-AI) isn’t useful/powerful/incentivized, thanks to various tools that can do the various tasks better and cheaper. I think those futures are incoherent, or at least very implausible. Agency is powerful. For example, one conclusion I am arguing for is:
When it becomes possible to make human-level AI agents, said agents will be able to outcompete various human-tool hybrids prevalent at the time in every important competition (e.g. for money, power, knowledge, SOTA performance, control of the future lightcone...)
Another is:
We should expect Agency as Byproduct, i.e. expect some plausible training processes to produce agenty AIs even when their designers weren't explicitly aiming for that outcome.
I’ve had these ideas for about a year but never got around to turning them into rigorous research. Given my current priorities it looks like I might never do that, so instead I’m going to bang it out over a couple of weekends so it doesn’t distract from my main work. :/ I won't be offended if you don't bother to read it.
Outline of this sequence:
1. P₂B: Plan to P₂B Better - LessWrong
2. Agents as P₂B chain reactions
3. Interlude: Agents as automobiles
4. Gradations of agency
5. Why agents are powerful, & other conclusions
Incomplete list of related literature and comments:
Frequent arguments about alignment - LessWrong (A comment in which Richard Ngo summarizes a common pattern of conversation about the risk from agenty AI vs. other sorts of AI risk)
Joe Carlsmith, drawing on writings from others, had 20% credence that AI agents won't be powerful enough relative to non-agents to be incentivised. I recommend reading the whole report, or at least the relevant sections on APS-AI and incentives to build it.
Eric Drexler's CAIS report (as summarized by Rohin Shah) argues basically that it should be much more than 20%. Richard Ngo's thoughts here.
Why You Shouldn't Be a Tool: The Power of Agency by Gwern. (OK, it seems to have a different title now, maybe it always did and I hallucinated this memory...) This essay, more than anything else, inspired my current views.
The Ground of Optimization by Alex Flint argues: "there is a specific class of intelligent systems — which we call optimizing systems — that are worthy of special attention and study due to their potential to reshape the world. The set of optimizing systems is smaller than the set of all AI services, but larger than the set of goal-directed agentic systems."
Yudkowsky and Ngo conversation (especially as summarized by Nate Soares) seems to be arguing for something similar to Alex -- I imagine Yudkowsky would say that by focusing on agency I'm missing the forest for the trees: there is a broader class of systems (optimizers? consequentialists? makers-of-plans-that-lase?) of which agents are a special case, and it's this broader class that has the interesting and powerful and scary properties. I think this is probably right but my brain is not yet galaxy enough to grok it; I'm going to defy EY's advice and keep thinking about the trees for now. I look forward to eventually stepping back and trying to see the forest.
Thanks to various people, mostly at and around CLR, for conversations that shaped my views on this subject. Thanks especially to Ramana Kumar whose contributions were the greatest.