Very interesting post. I was very prepared to praise it with "this draws some useful categories for me," but it began to get less clear as I tried more examples. And I'm still trying to come up with a distinction between brinksmanship and extortion. I've thought about the payoff matrices (they look the same), and whether "unilateral attack vs. not" is a distinguishing factor (I don't think so). I still can't find a clear distinction.
Examples:
(1) You say that releasing nude photos is in the blackmail category. But who's the audience?
(2) For n=1, m large: Is an example of brinkmanship here a monopolistic buyer who will only choose suppliers giving cutrate prices? It seems to have been quite effective for Walmart decades ago, and effective for Hollywood today ( https://www.engadget.com/2018-02-24-black-panther-vfx-models.html ).
(1) You say that releasing nude photos is in the blackmail category. But who's the audience?
The other people of whom you have nude photos, who are now incentivised to pay up rather than kick up a fuss.
(2) For n=1, m large: Is an example of brinkmanship here a monopolistic buyer who will only choose suppliers giving cutrate prices?
Interesting example that I hadn't really considered. I'd say its fits more under extortion than brinksmanship, though. A small supplier has to sell, or they won't stay in business. If there's a single buyer, "I won't buy from you" is the same as "I will ruin you". Abstracting away the property rights (Walmart is definitely legally allowed to do this), this seems very much an extorsion.
The other people of whom you have nude photos, who are now incentivised to pay up rather than kick up a fuss.
Releasing one photo from a previously believed to be secure set of photos, where other photos in the same set are compromising can suffice for single member audience case.
I'm not used to thinking about such issues, but the distinction between extorsion and brinksmanship makes sense, at least in the context of this post.
One thing I would have liked is a clear link to AI alignment. That being said, I assume that most ideas on this topic relate to one or two problems of alignment, so maybe it's annoying to repeat them each time.
The connection to AI alignment is combining the different utilities of different entities without extortion ruining the combination, and dealing with threats and acausal trade.
EDIT: The terms "extortion" and "brinksmanship" used in this post don't fully map onto the real-world uses of the term, but are the closest to the concepts I'm trying to point to.
Extortion, simplified, is:
Brinksmanship is more like:
Written that way, the two sound very similar. Indeed, I've argued that there is little difference between extortion and trade offers, apart from the "default point".
So why am I claiming that these two are different, and that extortion is much more powerful? Because of a key difference in the default point: the outside audience.
How the audience reacts
Extortion audience
Suppose I am extorting someone; maybe I'm a blackmailer with naughty photos, a mafia offering "protection", or the roman empire demanding tribute. The problem for me is to make my threat credible: to show I will go through with the threat, even if that is risky or expensive for me to do so.
Suppose I have twenty targets that need to convince that I'm serious. Then if one of them resists, this is exactly what I need. I will publish their photos/burn down their shop/invade their territory. My threat is credible, because I've just shown that I will carry it out; this keeps the other nineteen targets in line. Indeed, I might actively want one target to resist; that way, I pay the expenses of one threat carried out, but get full compliance from the other nineteen, rather than getting twenty sets of grudging partial compliances.
This makes resisting extortion very tricky. Suppose you were the target of my extortion, and suppose that you had made it a principle to never give in to threats. And suppose that you had credibly demonstrated that principle. If we two are the only people around, then there's no advantage to me carrying out my threats[1].
But if there are other people in the audience, I still would want to hurt you if you don't give in. It's not about you; I want to credibly demonstrate I will carry out my threats. Indeed, carrying out my threats against you might be the best move on my part; I've shown I will carry them out, even when it arguably makes no sense to do so.
Brinksmanship audience
Now let's compare that with brinksmanship. Suppose I'm setting out the conditions for a business partnership, negotiating how to split a restaurant bill with friends, or maybe I'm a country negotiating a trade deal with a union I've just left.
Now, maybe you won't offer me the deal I want, and I can then prove my credibility by blowing up the deal. Or maybe you'll grudgingly give in, and I'll walk away triumphant.
But the problem is as follows: it is not useful for me to have a credible reputation for following up on brinksmanship threats. The other targets don't want to deal with someone with a reputation like that; they'll offer fewer deals, and worse ones.
That makes resisting brinksmanship much easier than resisting extortion. If there are twenty potential groups I might strike a deal with, then blowing up a deal with one of them is not going to help me with the other nineteen. Committing to rejecting brinksmanship - say, by rejecting any deal that isn't "fair[2]" - is more credible, because I don't benefit from blowing things up.
Which audience, though?
Not all real-world examples of extortion and brinksmanship fit neatly into the above framework. Terrorists are generally trying to extort governments, but a generic "we don't negotiate with terrorists" seems to have served governments pretty well. The cold war saw lots of brinksmanship between the superpowers, where there wasn't really an audience of entities of comparable power.
To sort that out, let's consider the size of the audience, i.e. the other entities that might practice extortion or brinksmanship, or have it practiced on them. Thus define:
So, how do things stand for varying n and m?
n=1, m=1
This is the USA and the USSR during the Cold War. They have no rivals of comparable power; they don't really need to demonstrate the credibility of their mutual threats to an outside audience. All that matters, fundamentally, is how credible their threats are to each other.
In this situation, extortion and brinksmanship are essentially the same thing. The two superpowers are locked in a contest with no clear default state, and which neither can exit or ignore. The situation is complicated, and very much dependent on individual decisions and personalities; there is no "best behaviour".
n=1, m large
This is the situation I described above: one main extorter/brinksmanshipper, a large audience of potential victims/trade partners.
As we saw, extortion is effective and hard to resist, brinksmanship is ineffective.
n large, m=1
Here there is a single "victim", and many entities that might seek to take advantage of them. This is like a government that "does not negotiate with terrorists"; there are many terrorists, potential terrorists, potential hijackers, potential hostage-takers, and so on, but one target.
Here the incentives are reversed for extortion: the target is incentivised to resist extortion, even at great cost, lest giving in encourage others to try their hand at it. Since there's only one target, there's no audience that the extorters will try and show their credibility to, so they won't be incentivised to go ahead with their threats.
Brinksmanship is even less effective; the target will hold out for good deals from some of Ai, to pressure the rest to also offer good deals.
n and m both large
Here the incentives are harder to parse for the extorters and their targets. The extorter Ai wants to demonstrate to all the Bj that they are serious about following up on their threats, and the target Bj wants to demonstrate to all Ai that they are serious in resisting threats.
Depending on the individual dynamics, the Ai may attack, or fail to attack, for reasons that have nothing to do with their specific target. This gets hard to predict, and can depend on contingent factors (just as in the n=1, m=1 case), but there are multiple equilibriums that can be relatively stable (unlike the n=1, m=1 case). See the hawk versus dove game and the various complicated variants on that.
Brinksmanship continues to be ineffective.
Multiple fairness criteria
If there are multiple fairness criteria, the equation for brinksmanship shifts: entities can apply brinksmanship to some deals, just as long at the rest of their audience doesn't feel they were excessive. Conversely, targets are incentivised to not have excessively strong "anti-brinksmanship" standards. It seems likely that some broad consensus on "fairness" will emerge, as a sort of averaging of different entities' judgements.
Neglecting acausal or counterfactual situations. ↩︎
Notice the strong similarity between anti-brinksmanship and brinksmanship: in both cases, the parties are threatening to stop the deal unless their conditions are met. ↩︎