There aren't many detailed stories about how things could go well with AI.[1] So I'm about to tell you one.
This is an attempt to articulate a path, through the AI transition, to collective flourishing.
What makes endgame sketches like this useful is that they need to be constrained. They need to be coherent with your best guess of how the world works, and earnestly engaged with good-faith articulations of risks and failure modes.
Therefore, I start by laying out the two failure modes I believe we face – failure through unilateral domination, and failure through competitive erosion – and then briefly discuss my assumptions about the world we're in – gradual take-off and multipolarity (at least for the next few years, and longer if we get things right).
Once that’s out of the way, we are ready to launch into a sketch for how humanity might navigate its way through the AI transition – past demons and perils – towards durable, collective flourishing.
This is obviously not, nor could it be, a full plan. In fact, it is an early draft that I would ordinarily have spent more time on before publishing. But I’m hoping it may spark productive thinking in others, and given the current trajectory, time is of the essence.
Failure modes
Unilateral domination
One actor, or a tightly coupled coalition, pulls far enough ahead that they can impose their will on everyone else. This could be a misaligned AI system that is capable of and motivated to outmanoeuvre humans and humanity at large, or a human group that could seize control over the future, projecting its power through AI capabilities (though eventually you will run into questions about who really is in charge).
A lot has been written about whether and why we should expect powerful AIs to become misaligned (e.g. 1, 2, 3), and whether and how this could lead humanity to ‘lose control’ of the AI(s) (e.g. 1, 2, 3). There’s also been some writing on whether and how AI might cause power to become increasingly centralised (e.g. 1, 2, 3). I will largely assume familiarity with these arguments and not dive into those further here.
What unifies this cluster is unilateral imposition. Someone gains the power to shape the future according to their goals, regardless of what everyone else wants. The capacity for collective steering is seized by a narrow actor.
Competitive erosion
No single dominator emerges, but competitive pressures grind away everything that isn't locally adaptive. In this scenario, multipolarity leads to unconstrained competition and Molochian dynamics become Malthusian dynamics — races to the bottom, corners cut, values sacrificed for competitive advantage (e.g. 1, 2). Gradual Disempowerment, for example, describes a scenario where humans incrementally hand over more control to AI systems, not through any single decision, but because each step is locally rational and no sufficiently weighty coalition can coordinate to stop.
At the heart of this concern might be a question of whether goodness is just inherently anti-competitive. What if the coalitions that preserve human values, that invest in safety, beauty and wellbeing, and that maintain meaningful human agency, are systematically outcompeted by coalitions that don't? If so, even without any single bad actor, we could drift into a future where human flourishing has been optimized away.
The capacity for collective steering isn't seized in this scenario. It's dissolved.
***
These two clusters of failure modes look pretty much like polar opposites. In the first case, humanity's collective capacity to steer toward flourishing is captured by a narrow actor. In erosion, it's dissipated by competitive dynamics that no one controls.
One might think that, given that, it’s relatively easy to tell at any point in time whether you’re more likely to fail in one way or the other. In practice, however, the balance between them appears surprisingly delicate. I, for one, am seriously concerned about both of these failure modes, as opposed to predominantly with one over the other.
Put differently: navigating the age of AI appears to be a pretty narrow path.
Maybe these failure modes aren't that antithetical after all. They both presume dynamics in which one party’s win must cost another party a loss: either there are some losers (dominated by a winner) or no winners (Malthusian catastrophe). What if there is a middle way between them, that can avert the zero-sum, win-lose dynamic altogether?
Assumptions
I expect that, for at least a handful more years, frontier AI capabilities will keep getting better, but they do so via steady scale‑ups, architectural tweaks, and training tricks rather than a single clean discontinuity. Capability jumps can be sharp on specific benchmarks, but the overall trajectory has the texture of rapid-but-continuous-in-aggregate and jagged-in-specific progress. AI adoption and economic impact has the same “gradual and jagged” texture.
There are important inflection points along the route of AI progress, at which the rate of progress changes. We’ve already seen one inflection point sometime in 2024, the result of new posttraining methods (‘reasoning models’). Coding agents accelerating software and machine learning research and engineering are causing another inflection point, which will likely become measurable towards the end of this year (2026). And there are more inflection points ahead, such as autonomous R&D on compute hardware, new architectural insights leading to continual learning, and, eventually, autonomous manufacturing fleets.
I also expect that, for at least a handful more years, there is no single actor (human, organisation, or AI) that is unilaterally dominant over all others. At the moment, there are at least three frontier AI companies that essentially rotate the award for best model capabilities between each other, with a couple more companies following closely behind. And companies that publish their weights openly have shown to be able to keep catching up, despite the inflection point, so frontier capabilities never stay fully proprietary for long.
Inflection points may become increasingly consequential. They have the potential to increasingly tip dynamics toward winner-take-more. But for now, these developments happen against a background of multiple serious players, overlapping infrastructures, and tangled incentives.
In reality, a ‘fast takeoff’ cannot be fully ruled out. If it happens soon, the story I'm telling here may not help much; we'd need something else. But that's not my central expectation for the near term, so I'm focusing on paths that assume takeoff overlaps with deliberate human influence over the trajectory.
Enabling Stable Win-Win Coalitions
In a nutshell
How do we walk the narrow path between different failure modes?
We need to enable stable, win-win human-AI coalitions.
This means coalitions that are able to efficiently reach Pareto-improving agreements. To achieve this, such coalitions need to be made up of actors that (a) have a good model of the world and of cause-and-effect, (b) understand their own interests, (c) are able to bargain efficiently with other actors, who have their own (likely different) interests, and (d) have access to privacy-preserving, strategy-proof assurance tech that creates justified trust that agreements will be upheld. In other words, coalitions that have unlocked coasean bargaining at scale.
It is worth noting that Coase’s theorem alone says nothing about distributional fairness; typically, “the rich get richer” from trade, with each party’s gains proportional to their initial endowment. However, if all parties are robustly on such an exponential growth trajectory, and if technological maturity unlocks many orders of magnitude of growth in total aggregate value creation, this may be acceptable. Consider this: if Alice starts with 2 resources and Bob starts with 1 resource and all resources double every year, then one could view this as “Alice always gets twice as much as Bob”, or one could view it as “Alice and Bob get the same over time, Bob just gets it one year later”.
Such a win-win coalition can coordinate on investing in public goods that make it increasingly robust and club goods that make it increasingly attractive — from resilience to flourishing. These investments compound. Over time, this creates increasingly stabilising strategic incentives:
Investment in resilience unlocks a world that’s increasingly defense-favoured, meaning that attacking the coalition is less and less likely to be a compelling choice, even to rogue actors
Investing in prosperity means joining the coalition and strengthening it becomes more and more attractive
Investing in privacy-preserving trust and assurance infrastructure means undermining, exploiting or freeriding on the coalition becomes less and less feasible
These coalitions will include both humans and AI. It seems that if today’s AI paradigm remains the primary source of AI capabilities, our existing prosaic alignment methods are able to produce powerful AI systems that are essentially aligned, and thus worthy (and necessary) allies. Tools for scalable oversight and agent orchestration are critical to aggregate the capabilities of a large coalition into enough effective uplift to be resilient against rogue agents, and to enable robust coordination among humans and AIs.
This uplift needs to be channeled into differentially accelerating technical and institutional solutions that improve our collective sense-making and resilience — and to do so quickly enough to stay ahead of catastrophic risks. If we can sufficiently empower humans to understand the world and their own interests, if the gains from cooperation are large enough, and if defense-favouring dynamics make unilateral seizure costly, then even actors much more powerful than any individual human may find joining and strengthening the coalition more attractive than attacking it. Done well, such coalitions could potentially withstand even highly capable rogue actors, unless they are truly endowed with a “decisive strategic advantage” (which, in an epistemically-resilient and cyber-secure world, with conservation of matter and energy, would be very difficult to obtain).
That was the compressed version. Let’s now unpack that more slowly, step by step. First: what does it actually take to build stable, win-win coalitions?
Unlocking Pareto: driving down transaction costs
Pareto-improving agreements — deals that make at least one person better off and no one worse off — often exist in principle but don't happen in practice. Why? Because the friction is too high. Finding the relevant parties, figuring out what's true, understanding what everyone wants, negotiating terms, making commitments credible, verifying follow-through — each of these steps has costs. Economists call these frictions ‘transaction costs’. When the costs exceed the gains, the deal doesn't happen, even if it would have benefited everyone.
Friction-less transitioning is not the world we live in, but the insight is generative: if you can reduce the friction, you expand the set of achievable win-win agreements. What technical or institutional innovations could bring us closer to this world? How can AI itself transform the playing field? Seb Krier of Google DeepMind wrote about this exact vision in "Coasean bargaining at scale".
To pave the way to this world, we first need to identify where frictions block coordination. Following Coase, transaction cost often get categorised into approximately the following clusters:
Information costs. One cannot express preferences or assess terms of agreement except with respect to a model of the world. What are the relevant variables? What are the causal relationships? What would actually happen under different arrangements? What are your beliefs about the status quo? These are the costs of acquiring the information needed to even formulate what you want and evaluate what's on offer. Poor models lead to agreements that don't serve the actor’s actual interests, or to no agreement at all, because parties can't establish common ground about what's true.
Deliberation costs. In order to strike agreements that protect and further an actor's interest, they need to have a good understanding of those interests in the first place. What do you actually value? What trade-offs are you willing to make? What terms would you accept? Deliberation (as opposed to bargaining) is a single-principal problem (in the ‘principal-agent’ sense): clarifying the principal’s own preferences given their understanding of the world. In the context of AI, this also includes an AI representative (the ‘agent’) gaining a good understanding of their principal’s preferences. The principal here could literally be an individual human, or it could be a constituency that needs to arrive at a coherent position — like a team, a company's shareholders, or a nation's citizens. The output of deliberation is a clear enough map of your preferences within the space of possible options that you can meaningfully enter negotiation.
Bargaining costs. Once parties know what they want, they need to find mutually acceptable terms. This is a multi-principal problem: negotiating with others who have different interests. Can we identify the set of arrangements that would make everyone better off? Can we agree on how to divide the gains from trade? Can we specify terms precisely enough to act on? Bargaining is hard in practice even between two parties with full information; it becomes much harder with many parties, incomplete information, and strategic incentives to misrepresent.
Monitoring and enforcement costs. An agreement is only valuable if it's actually honored. This means verifying that implementation meets the agreed terms, detecting violations, and imposing appropriate consequences — such that it becomes rational for actors to enter trades that would otherwise only be rational if there were no risk of counterparties defecting on the agreement. Without credible monitoring and enforcement, many deals never get made: parties won't agree to terms they don't believe will be kept.
What we need to build
AI is poised to dramatically reduce transaction costs. This is especially true if we can ensure strong alignment between principal and AI, allowing the AI to serve as a trusted representative or "personal advocate." Agentic AI advocates could dedicate vastly more cognitive effort than any human negotiator to understanding their principal's interests, modelling the world, and identifying and negotiating agreements in parallel.
But AI advocates alone are insufficient to fully realize this vision. Driving down transaction costs also crucially requires infrastructure at multiple layers.
Information infrastructure that makes it cheaper to figure out what's true and share it appropriately
For example: scalable world-modeling infrastructure like shared ontologies and ‘living’ knowledge graphs, privacy-preserving computation and mechanisms for aggregating distributed information like prediction markets, reputation systems and sensor networks.
Deliberation infrastructure that helps individuals and groups understand what they actually want.
For example: preference elicitation and structured reflection aids, infrastructure for collective sense-making, deliberation, and imagination.
Bargaining infrastructure that makes it easier to reach, specify and execute complex multi-party agreements between heterogeneous actors, including mechanisms that are robust to strategic manipulation.
For example: AIs capable of using or generating strategy-proof protocols or programmable cryptography, AI delegates with verifiable constraints and nuanced principal-specified affordances
Trust/Assurance infrastructure that ground digital claims in physical reality, thereby driving down monitoring and enforcement costs.
For example: secure hardware, tamper-evident sensors, verifiable computation, guaranteeable actuators.
Cutting across all of these are scalable oversight solutions: infrastructure that allows humans to gain justified confidence in AI outputs — be that in science, engineering, or decision-making — even as AI systems handle more of the work. Even if AIs are essentially aligned, blind trust is not robust. AI systems still can make mistakes, misunderstand tasks (including because the instructions may be genuinely ambiguous), be subject to sabotage, or similar. Solving this unlocks AI-AI coordination (agents can prove things to each other) and AI-human coordination (humans can maintain oversight even as AI capabilities grow). Without it, we either don't use AI (and fall behind) or use it without adequate assurance (and introduce new risks).
These layers together form a shared trust protocol: a stack where each layer enables the others, and the whole becomes a foundation for coordination at scale. Trust infrastructure grounds information infrastructure; accurate world models support deliberation; clear preferences enable efficient bargaining; enforceable agreements close the loop. And finally, surplus from cooperation funds further investment in the stack.
What empowered coalitions can do
As transaction costs fall, agreements that were previously too costly to reach become achievable. The frontier of viable cooperation expands. At root, this is about uninternalised externalities: without coordination, risks get underpriced and public goods get undersupplied. The goal isn't to eliminate risk (that would require forgoing too much value) but to enable efficient allocation — ensuring those who impose costs bear them, and investing collectively in goods that benefit everyone.
So, concretely, what do coalitions do with this expanding capacity?
Here, I’m mostly thinking about two key areas of investments: resilience and flourishing. Both matter, and they reinforce each other.
AI resilience is about ensuring that civilisational infrastructure can withstand AI-related disruptions, whether from misuse, accidents, or systemic effects. AI dramatically amplifies attack surfaces that already exist. Many vulnerabilities predate AI, but capable AI systems make them cheaper to exploit and harder to defend. AI also introduces new risk categories: systems that optimise for what we asked rather than what we wanted (less malevolent genie, more addiction dynamics), and cascading failures in AI-dependent infrastructure (the more of the economy runs on AI agents, the more consequential prompt injection attacks become).
A coalition that can coordinate can invest collectively in hardening these systems, in defensive technologies, oversight tooling, and socio-technical solutions that make civilisational infrastructure more robust. This includes biosecurity infrastructure (e.g. DNA synthesis screening, metagenomic early detection, distributed response capacity), hardening cyber and cyber-physical systems (e.g. verifiably secure code, tamper-secure robotics, verified control systems), and epistemic infrastructure (e.g. provenance tracking, scalable review, trusted sensors, tools for collective deliberation at scale), for example. These are ‘public goods’ that markets currently undersupply and that require coordination to build.
But resilience isn't the only goal; it's what protects the capacity to pursue everything else. Health, beauty, understanding, exploration, connection, creativity… — the things that make life worth living and the future worth reaching. More fundamentally: if coalitions oriented toward human flourishing can only ever play defense, with all surplus going to fend off threats, where nothing remains for the things that make the coalition worth joining, is a world where goodness has been lost to competitive erosion.
There's a deeper point here. The "failure through erosion" framing assumes that values are fragile and anti-competitive. But that may not be true. Coordination itself can be a winning strategy in a competitive world — and if so, the technologies [2]and institutions that enable and stabilise win-win coalitions get selected for. This includes not just infrastructure but also values and norms: small-l liberalism, pluralism, virtues like honesty, respect, and integrity.
Many of the values we worry about losing aren't vestigial; they're load-bearing.
Are stable win-win coalitions viable?
Is the picture I'm painting — stable, win-win coalitions — actually viable? Two sub-questions:
Is the world sufficiently defense-favoured?
Why would powerful AIs join rather than defect?
Whether coalitions can be viable depends on whether investing in resilience can free up surplus for flourishing, or whether every last resource must go to defense just to survive. In short, it depends on whether the world is sufficiently defense-favoured. This is ultimately an empirical question: easy to speculate about, hard to know with confidence.
My best guess right now is that the world is in fact relatively defense-favoured: in a vast, abundant universe, the opportunity cost of fighting rather than cooperating or expanding elsewhere may simply be too high. That said, the practical upshot may be the same regardless. If the world is offense-favoured, there may not be much anyone can do. Given uncertainty, we should act as if defense-favoured dynamics are achievable, to preserve the possibility of success by building the socio-technical stack that could unlock it.
On the second question: even granting defense-favoured dynamics, why would AIs more powerful than any individual human cooperate rather than dominate?
Alignment is part of the answer. Without a prosocial disposition, it's hard to imagine forming strong, stable coalitions with AI systems. But alignment alone isn't sufficient. Several additional dynamics point toward cooperation being attractive even for very capable systems:
The gains from cooperation are large. If the coalition can facilitate Pareto-improving agreements at scale, joining offers real benefits: access to resources, trade, and collective capabilities no single actor could replicate.
The coalition is strong enough that attacking is costly. If win-win coalitions have invested in resilience, then attacking them, even if you are confident you are ultimately successful, has costs: resources expended, uncertainty about outcomes, the possibility of losing. The stronger the coalition's defenses, the less attractive unilateral aggression becomes.
Abundance makes the opportunity cost of forceful conquest high. Given the sheer size of the accessible universe, and the staggering amount of matter and energy it contains, the opportunity cost of fighting over Earth's resources may be far higher than the cost of just... going elsewhere. If you could spend your effort exploring and developing uncontested resources, why waste it on conflict? This is the "Paretotopian" intuition: a world where cooperation dominates because the pie is so large and growing that fighting over slices is simply inefficient.
Self-improving AIs face their own alignment problem. A capable AI trying to improve itself by creating more powerful successors faces a version of the same problem we face: ensuring that the successor, which may be much more capable, actually pursues the same goals. Given the difficulty and the stakes, systems might prefer to improve through tooling and coordination, rather than source code modifications: routes that preserve their values more reliably.
Beren Millidge has discussed related considerations. Some AIs may value humans for historical or sentimental reasons; or they might find that demonstrating care for weaker agents serves as a useful signal of cooperativeness to other AIs. At cosmic scales, keeping humans around is extraordinarily cheap for a serious post-biological civilization. And respecting existing property rights and social institutions may simply be a convenient Schelling point for AIs navigating a complex multi-agent economy.
The key defeating condition for such coalitions is an actor with a unilateral decisive strategic advantage. Short of that, it may be possible to build a socio-technical stack defense-favoured enough to make cooperation — or at least non-aggression — a stable equilibrium.
Closing
This is an attempt at a coherent story of success that takes the failure modes seriously. I don’t know if it’s right, but it identifies something concrete to build: coalitions that can make sense of the world, reach Pareto-improving agreements, and defend themselves.[3] This isn't a story about a single decisive move that locks in a good outcome. It's iterative: a sequence of investments that compound, each enabling the next.
The story also suggests that early investments in epistemic and coordination infrastructure are key, in that they unlock Pareto-improving agreements, which in turn unlock investments in public goods like resilience.
But building the trust infrastructure and the resilience tech takes time, and we may not have very much time until the progress in AI leads to catastrophic harms — be that due to an accident, misuse or else. This is why effectively leveraging AI uplift is key. There is certainly no guarantee that our resilience and coordination tech stays sufficiently ahead, but leveraging AI through scalable oversight methods seems key. The next several years seem especially critical. We're in a race, not against a single adversary, but against the clock.
The prize, if we can get there, is collective flourishing. But a world where humanity, in coalition with AI systems, retains the capacity to shape its own future. A world where we can understand our situation clearly enough, coordinate effectively enough, and defend ourselves well enough to keep steering toward something better.
Acknowledgements
This piece was written with substantial help from Claude Opus 4.5, who served as thinking partner, editor, and co-drafter throughout. Thanks in particular to davidad for extended discussions and detailed feedback on the draft, and to Jacob Lagerros and Seb Krier for comments on earlier versions. I also want to give a nod to those whose discussions or writing have shaped my thinking here: Alex Obadia, Ashish Uppala, Beren Millidge, Eddie Kembery, Eric Drexler, Jan Kulveit, and Nicola Greco.
As valuable as it is to deeply understand the risks we face with advanced AI, having aspirational-but-coherent stories of success is valuable, too. Such stories give one states to back-chain from, and even for forward-chaining, they provide some frames of reference against which to evaluate whether a given intervention is plausibly moving us in the right direction, even if these are not the only frames you may wish to hold. At a minimum, if we’re failing to articulate any coherent story for hope, that should raise some flags.
For example, technological solutions can often ‘solve’ apparent coordination failures by ‘growing the pie’ – moving out the Pareto frontier, e.g. through technological innovation.
by Nora Ammann & Claude Opus 4.5
Setting the stage
There aren't many detailed stories about how things could go well with AI.[1] So I'm about to tell you one.
This is an attempt to articulate a path, through the AI transition, to collective flourishing.
What makes endgame sketches like this useful is that they need to be constrained. They need to be coherent with your best guess of how the world works, and earnestly engaged with good-faith articulations of risks and failure modes.
Therefore, I start by laying out the two failure modes I believe we face – failure through unilateral domination, and failure through competitive erosion – and then briefly discuss my assumptions about the world we're in – gradual take-off and multipolarity (at least for the next few years, and longer if we get things right).
Once that’s out of the way, we are ready to launch into a sketch for how humanity might navigate its way through the AI transition – past demons and perils – towards durable, collective flourishing.
This is obviously not, nor could it be, a full plan. In fact, it is an early draft that I would ordinarily have spent more time on before publishing. But I’m hoping it may spark productive thinking in others, and given the current trajectory, time is of the essence.
Failure modes
Unilateral domination
One actor, or a tightly coupled coalition, pulls far enough ahead that they can impose their will on everyone else. This could be a misaligned AI system that is capable of and motivated to outmanoeuvre humans and humanity at large, or a human group that could seize control over the future, projecting its power through AI capabilities (though eventually you will run into questions about who really is in charge).
A lot has been written about whether and why we should expect powerful AIs to become misaligned (e.g. 1, 2, 3), and whether and how this could lead humanity to ‘lose control’ of the AI(s) (e.g. 1, 2, 3). There’s also been some writing on whether and how AI might cause power to become increasingly centralised (e.g. 1, 2, 3). I will largely assume familiarity with these arguments and not dive into those further here.
What unifies this cluster is unilateral imposition. Someone gains the power to shape the future according to their goals, regardless of what everyone else wants. The capacity for collective steering is seized by a narrow actor.
Competitive erosion
No single dominator emerges, but competitive pressures grind away everything that isn't locally adaptive. In this scenario, multipolarity leads to unconstrained competition and Molochian dynamics become Malthusian dynamics — races to the bottom, corners cut, values sacrificed for competitive advantage (e.g. 1, 2). Gradual Disempowerment, for example, describes a scenario where humans incrementally hand over more control to AI systems, not through any single decision, but because each step is locally rational and no sufficiently weighty coalition can coordinate to stop.
At the heart of this concern might be a question of whether goodness is just inherently anti-competitive. What if the coalitions that preserve human values, that invest in safety, beauty and wellbeing, and that maintain meaningful human agency, are systematically outcompeted by coalitions that don't? If so, even without any single bad actor, we could drift into a future where human flourishing has been optimized away.
The capacity for collective steering isn't seized in this scenario. It's dissolved.
***
These two clusters of failure modes look pretty much like polar opposites. In the first case, humanity's collective capacity to steer toward flourishing is captured by a narrow actor. In erosion, it's dissipated by competitive dynamics that no one controls.
One might think that, given that, it’s relatively easy to tell at any point in time whether you’re more likely to fail in one way or the other. In practice, however, the balance between them appears surprisingly delicate. I, for one, am seriously concerned about both of these failure modes, as opposed to predominantly with one over the other.
Put differently: navigating the age of AI appears to be a pretty narrow path.
Maybe these failure modes aren't that antithetical after all. They both presume dynamics in which one party’s win must cost another party a loss: either there are some losers (dominated by a winner) or no winners (Malthusian catastrophe). What if there is a middle way between them, that can avert the zero-sum, win-lose dynamic altogether?
Assumptions
I expect that, for at least a handful more years, frontier AI capabilities will keep getting better, but they do so via steady scale‑ups, architectural tweaks, and training tricks rather than a single clean discontinuity. Capability jumps can be sharp on specific benchmarks, but the overall trajectory has the texture of rapid-but-continuous-in-aggregate and jagged-in-specific progress. AI adoption and economic impact has the same “gradual and jagged” texture.
There are important inflection points along the route of AI progress, at which the rate of progress changes. We’ve already seen one inflection point sometime in 2024, the result of new posttraining methods (‘reasoning models’). Coding agents accelerating software and machine learning research and engineering are causing another inflection point, which will likely become measurable towards the end of this year (2026). And there are more inflection points ahead, such as autonomous R&D on compute hardware, new architectural insights leading to continual learning, and, eventually, autonomous manufacturing fleets.
I also expect that, for at least a handful more years, there is no single actor (human, organisation, or AI) that is unilaterally dominant over all others. At the moment, there are at least three frontier AI companies that essentially rotate the award for best model capabilities between each other, with a couple more companies following closely behind. And companies that publish their weights openly have shown to be able to keep catching up, despite the inflection point, so frontier capabilities never stay fully proprietary for long.
Inflection points may become increasingly consequential. They have the potential to increasingly tip dynamics toward winner-take-more. But for now, these developments happen against a background of multiple serious players, overlapping infrastructures, and tangled incentives.
In reality, a ‘fast takeoff’ cannot be fully ruled out. If it happens soon, the story I'm telling here may not help much; we'd need something else. But that's not my central expectation for the near term, so I'm focusing on paths that assume takeoff overlaps with deliberate human influence over the trajectory.
Enabling Stable Win-Win Coalitions
In a nutshell
How do we walk the narrow path between different failure modes?
We need to enable stable, win-win human-AI coalitions.
This means coalitions that are able to efficiently reach Pareto-improving agreements. To achieve this, such coalitions need to be made up of actors that (a) have a good model of the world and of cause-and-effect, (b) understand their own interests, (c) are able to bargain efficiently with other actors, who have their own (likely different) interests, and (d) have access to privacy-preserving, strategy-proof assurance tech that creates justified trust that agreements will be upheld. In other words, coalitions that have unlocked coasean bargaining at scale.
It is worth noting that Coase’s theorem alone says nothing about distributional fairness; typically, “the rich get richer” from trade, with each party’s gains proportional to their initial endowment. However, if all parties are robustly on such an exponential growth trajectory, and if technological maturity unlocks many orders of magnitude of growth in total aggregate value creation, this may be acceptable. Consider this: if Alice starts with 2 resources and Bob starts with 1 resource and all resources double every year, then one could view this as “Alice always gets twice as much as Bob”, or one could view it as “Alice and Bob get the same over time, Bob just gets it one year later”.
Such a win-win coalition can coordinate on investing in public goods that make it increasingly robust and club goods that make it increasingly attractive — from resilience to flourishing. These investments compound. Over time, this creates increasingly stabilising strategic incentives:
These coalitions will include both humans and AI. It seems that if today’s AI paradigm remains the primary source of AI capabilities, our existing prosaic alignment methods are able to produce powerful AI systems that are essentially aligned, and thus worthy (and necessary) allies. Tools for scalable oversight and agent orchestration are critical to aggregate the capabilities of a large coalition into enough effective uplift to be resilient against rogue agents, and to enable robust coordination among humans and AIs.
This uplift needs to be channeled into differentially accelerating technical and institutional solutions that improve our collective sense-making and resilience — and to do so quickly enough to stay ahead of catastrophic risks. If we can sufficiently empower humans to understand the world and their own interests, if the gains from cooperation are large enough, and if defense-favouring dynamics make unilateral seizure costly, then even actors much more powerful than any individual human may find joining and strengthening the coalition more attractive than attacking it. Done well, such coalitions could potentially withstand even highly capable rogue actors, unless they are truly endowed with a “decisive strategic advantage” (which, in an epistemically-resilient and cyber-secure world, with conservation of matter and energy, would be very difficult to obtain).
That was the compressed version. Let’s now unpack that more slowly, step by step. First: what does it actually take to build stable, win-win coalitions?
Unlocking Pareto: driving down transaction costs
Pareto-improving agreements — deals that make at least one person better off and no one worse off — often exist in principle but don't happen in practice. Why? Because the friction is too high. Finding the relevant parties, figuring out what's true, understanding what everyone wants, negotiating terms, making commitments credible, verifying follow-through — each of these steps has costs. Economists call these frictions ‘transaction costs’. When the costs exceed the gains, the deal doesn't happen, even if it would have benefited everyone.
Friction-less transitioning is not the world we live in, but the insight is generative: if you can reduce the friction, you expand the set of achievable win-win agreements. What technical or institutional innovations could bring us closer to this world? How can AI itself transform the playing field? Seb Krier of Google DeepMind wrote about this exact vision in "Coasean bargaining at scale".
To pave the way to this world, we first need to identify where frictions block coordination. Following Coase, transaction cost often get categorised into approximately the following clusters:
What we need to build
AI is poised to dramatically reduce transaction costs. This is especially true if we can ensure strong alignment between principal and AI, allowing the AI to serve as a trusted representative or "personal advocate." Agentic AI advocates could dedicate vastly more cognitive effort than any human negotiator to understanding their principal's interests, modelling the world, and identifying and negotiating agreements in parallel.
But AI advocates alone are insufficient to fully realize this vision. Driving down transaction costs also crucially requires infrastructure at multiple layers.
Cutting across all of these are scalable oversight solutions: infrastructure that allows humans to gain justified confidence in AI outputs — be that in science, engineering, or decision-making — even as AI systems handle more of the work. Even if AIs are essentially aligned, blind trust is not robust. AI systems still can make mistakes, misunderstand tasks (including because the instructions may be genuinely ambiguous), be subject to sabotage, or similar. Solving this unlocks AI-AI coordination (agents can prove things to each other) and AI-human coordination (humans can maintain oversight even as AI capabilities grow). Without it, we either don't use AI (and fall behind) or use it without adequate assurance (and introduce new risks).
These layers together form a shared trust protocol: a stack where each layer enables the others, and the whole becomes a foundation for coordination at scale. Trust infrastructure grounds information infrastructure; accurate world models support deliberation; clear preferences enable efficient bargaining; enforceable agreements close the loop. And finally, surplus from cooperation funds further investment in the stack.
What empowered coalitions can do
As transaction costs fall, agreements that were previously too costly to reach become achievable. The frontier of viable cooperation expands. At root, this is about uninternalised externalities: without coordination, risks get underpriced and public goods get undersupplied. The goal isn't to eliminate risk (that would require forgoing too much value) but to enable efficient allocation — ensuring those who impose costs bear them, and investing collectively in goods that benefit everyone.
So, concretely, what do coalitions do with this expanding capacity?
Here, I’m mostly thinking about two key areas of investments: resilience and flourishing. Both matter, and they reinforce each other.
AI resilience is about ensuring that civilisational infrastructure can withstand AI-related disruptions, whether from misuse, accidents, or systemic effects. AI dramatically amplifies attack surfaces that already exist. Many vulnerabilities predate AI, but capable AI systems make them cheaper to exploit and harder to defend. AI also introduces new risk categories: systems that optimise for what we asked rather than what we wanted (less malevolent genie, more addiction dynamics), and cascading failures in AI-dependent infrastructure (the more of the economy runs on AI agents, the more consequential prompt injection attacks become).
A coalition that can coordinate can invest collectively in hardening these systems, in defensive technologies, oversight tooling, and socio-technical solutions that make civilisational infrastructure more robust. This includes biosecurity infrastructure (e.g. DNA synthesis screening, metagenomic early detection, distributed response capacity), hardening cyber and cyber-physical systems (e.g. verifiably secure code, tamper-secure robotics, verified control systems), and epistemic infrastructure (e.g. provenance tracking, scalable review, trusted sensors, tools for collective deliberation at scale), for example. These are ‘public goods’ that markets currently undersupply and that require coordination to build.
But resilience isn't the only goal; it's what protects the capacity to pursue everything else. Health, beauty, understanding, exploration, connection, creativity… — the things that make life worth living and the future worth reaching. More fundamentally: if coalitions oriented toward human flourishing can only ever play defense, with all surplus going to fend off threats, where nothing remains for the things that make the coalition worth joining, is a world where goodness has been lost to competitive erosion.
There's a deeper point here. The "failure through erosion" framing assumes that values are fragile and anti-competitive. But that may not be true. Coordination itself can be a winning strategy in a competitive world — and if so, the technologies [2]and institutions that enable and stabilise win-win coalitions get selected for. This includes not just infrastructure but also values and norms: small-l liberalism, pluralism, virtues like honesty, respect, and integrity.
Many of the values we worry about losing aren't vestigial; they're load-bearing.
Are stable win-win coalitions viable?
Is the picture I'm painting — stable, win-win coalitions — actually viable? Two sub-questions:
Whether coalitions can be viable depends on whether investing in resilience can free up surplus for flourishing, or whether every last resource must go to defense just to survive. In short, it depends on whether the world is sufficiently defense-favoured. This is ultimately an empirical question: easy to speculate about, hard to know with confidence.
My best guess right now is that the world is in fact relatively defense-favoured: in a vast, abundant universe, the opportunity cost of fighting rather than cooperating or expanding elsewhere may simply be too high. That said, the practical upshot may be the same regardless. If the world is offense-favoured, there may not be much anyone can do. Given uncertainty, we should act as if defense-favoured dynamics are achievable, to preserve the possibility of success by building the socio-technical stack that could unlock it.
On the second question: even granting defense-favoured dynamics, why would AIs more powerful than any individual human cooperate rather than dominate?
Alignment is part of the answer. Without a prosocial disposition, it's hard to imagine forming strong, stable coalitions with AI systems. But alignment alone isn't sufficient. Several additional dynamics point toward cooperation being attractive even for very capable systems:
Beren Millidge has discussed related considerations. Some AIs may value humans for historical or sentimental reasons; or they might find that demonstrating care for weaker agents serves as a useful signal of cooperativeness to other AIs. At cosmic scales, keeping humans around is extraordinarily cheap for a serious post-biological civilization. And respecting existing property rights and social institutions may simply be a convenient Schelling point for AIs navigating a complex multi-agent economy.
The key defeating condition for such coalitions is an actor with a unilateral decisive strategic advantage. Short of that, it may be possible to build a socio-technical stack defense-favoured enough to make cooperation — or at least non-aggression — a stable equilibrium.
Closing
This is an attempt at a coherent story of success that takes the failure modes seriously. I don’t know if it’s right, but it identifies something concrete to build: coalitions that can make sense of the world, reach Pareto-improving agreements, and defend themselves.[3] This isn't a story about a single decisive move that locks in a good outcome. It's iterative: a sequence of investments that compound, each enabling the next.
The story also suggests that early investments in epistemic and coordination infrastructure are key, in that they unlock Pareto-improving agreements, which in turn unlock investments in public goods like resilience.
But building the trust infrastructure and the resilience tech takes time, and we may not have very much time until the progress in AI leads to catastrophic harms — be that due to an accident, misuse or else. This is why effectively leveraging AI uplift is key. There is certainly no guarantee that our resilience and coordination tech stays sufficiently ahead, but leveraging AI through scalable oversight methods seems key. The next several years seem especially critical. We're in a race, not against a single adversary, but against the clock.
The prize, if we can get there, is collective flourishing. But a world where humanity, in coalition with AI systems, retains the capacity to shape its own future. A world where we can understand our situation clearly enough, coordinate effectively enough, and defend ourselves well enough to keep steering toward something better.
Acknowledgements
This piece was written with substantial help from Claude Opus 4.5, who served as thinking partner, editor, and co-drafter throughout. Thanks in particular to davidad for extended discussions and detailed feedback on the draft, and to Jacob Lagerros and Seb Krier for comments on earlier versions. I also want to give a nod to those whose discussions or writing have shaped my thinking here: Alex Obadia, Ashish Uppala, Beren Millidge, Eddie Kembery, Eric Drexler, Jan Kulveit, and Nicola Greco.
As valuable as it is to deeply understand the risks we face with advanced AI, having aspirational-but-coherent stories of success is valuable, too. Such stories give one states to back-chain from, and even for forward-chaining, they provide some frames of reference against which to evaluate whether a given intervention is plausibly moving us in the right direction, even if these are not the only frames you may wish to hold. At a minimum, if we’re failing to articulate any coherent story for hope, that should raise some flags.
For example, technological solutions can often ‘solve’ apparent coordination failures by ‘growing the pie’ – moving out the Pareto frontier, e.g. through technological innovation.
For a very similar strategic picture, see Eric Drexler’s Framework for a Hypercapable World.