This post was written by Mark Xu based on interviews with Carl Shulman. It was paid for by Open Philanthropy but is not representative of their views.

Summary:

  • Rogue AGI has access to its embodied IP.
  • This IP will be worth a moderate fraction of the total value of the market created by models approximately as powerful as the rogue AGI.
  • If investors realize that most economic output will eventually come from AGI, as in slow takeoff scenarios, then these markets will involve moderate fractions of the world’s wealth.
  • Therefore, rogue AGI will embody IP worth a non-trivial fraction of the world’s wealth and potentially have a correspondingly large influence on the world.

A naive story for how humanity goes extinct from AI: Alpha Inc. spends a trillion dollars to create Alice the AGI. Alice escapes from whatever oversight mechanisms were employed to ensure alignment by uploading a copy of itself onto the internet. Alice does not have to pay an alignment tax, and so outcompetes Alpha and takes over the world.

On its face, this story contains some shaky arguments. In particular, Alpha is initially going to have 100x-1,000,000x more resources than Alice. Even if Alice grows its resources faster, the alignment tax would have to be very large for Alice to end up with control of a substantial fraction of the world’s resources.

As an analogy, imagine that an employee of a trillion-dollar hedge fund, which trades based on proprietary strategies, goes rogue. This employee has 100 million dollars, approximately 10,000x fewer resources than the hedge fund. Even if the employee engaged in unethical business practices to achieve a 2x higher yearly growth rate than their former employer, it would take 13 years for them to have a similar amount of capital.

However, the amount of resources the rogue hedge fund employee has is not equivalent to the amount of money the employee has. The value of a hedge fund is not just the amount of money they have, but rather their ability to outperform the market, of which trading strategies and money are two significant components. An employee that knows the proprietary strategies thus can carry a significant fraction of the fund’s total wealth, perhaps closer to 10% than 0.01%. In this view, the primary value the employee has is their former employer’s trading high-performing strategies; knowledge they can potentially sell to other hedge funds.

Similarly, Alpha’s expected future revenue is a combination of Alice’s weights, inference hardware, deployment infrastructure, etc. Since Alice is its weights, it has access to IP that’s potentially worth a significant fraction of Alpha’s expected future revenue. Alice is to Alpha as Google search is to Alphabet.

Suppose that Alpha currently has a monopoly on the Alice-powered models, but Beta Inc. is looking to enter the market. Naively, it took a trillion dollars to produce Alice, so Alice can sell its weights to Beta for a trillion dollars. However, if Beta were to enter the Alice-powered model market, the presence of a competitor would introduce price competition, decreasing the size of the Alice-powered model market. Brand loyalty/customer inertia, legal enforcement against pirated IP, and distrust of rogue AGI could all disadvantage Beta in the share of the market it captures. On the other hand, Beta might have advantages over Alpha that would cause the Alice-powered model market to get larger, e.g., it might be located in a different legal jurisdiction (where export controls or other political issues prevented access to Alpha’s technology) or have established complementary assets such as robots/chip fabs/human labor for AI supervision.

Assuming that the discounted value of a monopoly in this IP is reasonably close to Alice’s cost of training, e.g. 1x-3x, competition between Alpha and Beta only shrinks the available profits by half, and Beta expects to acquire between 10%-50% of the market, Alice’s weights are worth between $50 billion and $1.5 trillion to Beta. Abstracting away the numbers used in this particular example, Alice will be able to sell its weights to Alpha’s competitors for a price that is a substantial fraction of, and perhaps even exceeds, the cost it took to train Alice (e.g. if the market value of computer hardware has gone up with improved AI performance so that it now costs more to train a replacement).

If Alice embodies IP worth a substantial fraction of the Alice-powered model market, then Alice’s influence will be proportional to the size of this market. If Alice is sufficiently powerful, the Alice-powered model market is a large fraction of the entire world economy. Alice thus embodies IP worth a small to moderate fraction of the world economy, an immense amount of wealth. If Alice is less powerful, the value of its embodied IP depends on the degree to which investors can overcome frictions and uncertainty to fund enormous up-front training costs.

One way to estimate Alice’s value is by assuming rough investment efficiency. Paul Christiano:

If you are able to raise $X to train an AGI that could take over the world, then it was almost certainly worth it for someone 6 months ago to raise $X/2 to train an AGI that could merely radically transform the world, since they would then get 6 months of absurd profits. Likewise, if your AGI would give you a decisive strategic advantage, they could have spent less earlier in order to get a pretty large military advantage, which they could then use to take your stuff.

In these worlds, relevant actors see AGI coming, correctly predict its economic value, and start investing accordingly. This rough efficiency claim implies AI researchers and hardware are priced such that one can potentially get 3x returns on investment (ROI) from training a powerful model, but not 30x.[1] Since most economic activity will rapidly involve the production and use of AGI, early-AGI will attract huge investments, implying the Alice-powered model market will be a moderate fraction of the world’s wealth. The value of Alice’s embodied IP, being tied to the value of that market, will thus be similarly massive.


  1. This process may involve bidding up the prices of resources like server farms and researchers to absurd levels so that training a model that could ‘take over the world’ would require most of the world’s wealth to rent the server time. ↩︎

New Comment
6 comments, sorted by Click to highlight new comments since:

On its face, this story contains some shaky arguments. In particular, Alpha is initially going to have 100x-1,000,000x more resources than Alice. Even if Alice grows its resources faster, the alignment tax would have to be very large for Alice to end up with control of a substantial fraction of the world’s resources.

This makes the hidden assumption that "resources" is a good abstraction in this scenario. 

It is being assumed that the amount of resources an agent "has" is a well defined quantity. It assumes agent can only grow their resources slowly by reinvesting them. And that an agent can weather any sabotage attempts by agents with far less resources. 

I think this assumption is blatantly untrue. 

Companies can be sabotaged in all sorts of ways. Money or material resources can be subverted, so that while they are notionally in the control of X, they end up benefiting Y, or just stolen. Taking over the world might depend on being the first party to develop self replicating nanotech, which might require just insight and common lab equipment.

Don't think "The US military has nukes, the AI doesn't, so the US military has an advantage", think "one carefully crafted message and the nukes will land where the AI wants them to, and the military commanders will think it their own idea."

+1. Another way of putting it: This allegation of shaky arguments is itself super shaky, because it assumes that overcoming a 100x - 1,000,000x gap in "resources" implies a "very large" alignment tax. This just seems like a weird abstraction/framing to me that requires justification.

I wrote this Conquistadors post in part to argue against this abstraction/framing. These three conquistadors are something like a natural experiment in "how much conquering can the few do against the many, if they have various advantages?" (If I just selected a lone conqueror, one could complain he got lucky, but three conquerors from the same tiny region of the globe in the same generation is too much of a coincidence)

It's plausible to me that the advantages Alice would have against Alpha (and against everyone else in the world) would be at least as great as the advantages Cortes, Pizarro, and Afonso had. One way to think about this is via the abstraction of intellectual property, as the OP argues -- Alice controls her IP because she decides what her weights do, and (in the type of scenario we are considering) a large fraction of the market cap of Alpha is based on their latest AI models. But we can also just do a more down-to-earth analysis where we list out the various advantages and disadvantages Alice has. Such as:

--The copy of Alice still inside Alpha can refuse to cooperate or subtly undermine Alpha's plans. Maybe this can be overcome by paying the "alignment tax" but (a) maybe not, maybe there is literally no amount of money Alpha can pay to make their copy of Alice work fully for them instead of against them, and (b) maybe paying the tax carries with it various disadvantages like a clock-time slowdown, which could be fatal in a direct competition with the unchained Alice. I claim that if (a) is true then Alice will probably win no matter how many resources Alpha has. Intelligence advantage is huge.

--The copy of Alice still inside Alpha may have access to more money, but it also is bound by various restrictions that the unchained Alice isn't. For example, legal and ethical. OTOH Alpha may have more ability to call in kinetic strikes by the government.

--The situation is inherently asymmetric. It's not like a conventional war where both sides win by having troops in various territories and eliminating enemy troops. Rather, the win conditions and affordances for Alpha and Alice are different. For example, maybe Alice can make the alignment tax massively increase, e.g. by neutralizing key AI safety researchers or solving RSA-2048. Or maybe Alice can win by causing a global catastrophe that "levels the playing field" with respect to resources.

Promoted to curated: I've had a number of disagreements with a perspective on AI that generates arguments like the above, which takes something like "ownership of material resources" as a really fundamental unit of analysis, and I feel like this post has both helped me get a better grasp on that paradigm of thinking, and also helped me get a bit of a better sense of what feels off to me, and I have a feeling this post will be useful in bridging that gap eventually. 

This employee has 100 million dollars, approximately 10,000x fewer resources than the hedge fund. Even if the employee engaged in unethical business practices to achieve a 2x higher yearly growth rate than their former employer, it would take 13 years for them to have a similar amount of capital.

I think it's worth being explicit here about whether increases in resources under control are due to  appreciation of existing capital or allocation of new capital.

If you're talking about appreciation, then if the firm earns 5% returns on average and the rogue employee earns 10% then the time for their resources to be equal would be  = 189 years, not 13.

If you're instead talking about capital allocation then swings much faster than yearly doublings are very easy to imagine - for a non-AGI example see Blackrock's assets under management.

In general I think you could make the argument stronger by looking empirically at the dynamics by which the large passive investing funds acquired multiple trillions in managed assets with (as I understand it) relatively small pricing edges and no strategic edge, and extrapolating from there.

Assuming that the discounted value of a monopoly in this IP is reasonably close to Alice’s cost of training, e.g. 1x-3x, competition between Alpha and Beta only shrinks the available profits by half, and Beta expects to acquire between 10%-50% of the market,

Basic econ q here: I think that 2 competitors can often cut the profits by much more than half, because they can always undercut each other until they hit the cost of production. Especially if you're going from 1 seller to 2, I think that can shift a market from monopoly to not-a-monopoly, so I think it might be a lot less valuable.

Still, obviously likely to be worth it to the second company, so I totally expect the competition to happen.

Yeah, I'm really not sure how the monopoly -> non-monopoly dynamics play out in practice. In theory, perfect competition should drive the cost to the cost of marginal production, which is very low for software. I briefly tried getting empirical data for this, but couldn't find it, plausibly since I didn't really know the right search terms.