Anything you can do with n AIs, you can do with two (with directly opposed objectives)

jessicata

Summary: For any normal-form game, it's possible to cast the problem of finding a correlated equilibrium in this game as a 2-player zero-sum game. This seems useful because zero-sum games are easy to analyze and more resistant to collusion.

Consider the following class of games (equivalent to the class of normal-form games):

There are $n$ players. Player $i$ 's action set is $A_{i}$ . After each player $i$ chooses their action $A_{i}$ , the state $X$ results from these actions (perhaps stochastically), and each player $i$ receives utility $U_{i} (X)$ .

Often, we are interested in finding the Nash equilibrium of a game like this. One strategy for this is to instantiate the $n$ players as agents. However, this could cause collusion; see here and here for previous writing on collusion. Zero-sum games seem more resistant to collusion (although maybe not 100% resistant). Additionally, 2-player zero-sum games are typically easier to reason about than $n$ -player games.

So we might be interested in finding the Nash equilibrium using a zero-sum game. I don't actually know how to find a mixed Nash equilibrium, so instead I'll present a strategy for finding a correlated equilibrium (a superset of Nash equilibria which are computationally easier to find). Here's how it works:

The actor chooses actions $A_{1}, . . ., A_{n}$ .
The critic chooses a player index $I$ , observes the action $A_{I}$ , and suggests an alternative action $A_{I}^{'}$ .
Flip a fair coin. If it comes up heads, observe the state $X$ that results from actions $A_{1}, . . ., A_{n}$ , and give the actor utility $U_{I} (X)$ .
If it comes up tails, observe the state $X$ that results from actions $A_{1}, . . ., A_{I - 1}, A_{I}^{'}, A_{I + 1}, . . ., A_{n}$ , and give the actor utility $- U_{I} (X)$ .
Either way, the critic's utility is the negation of the actor's utility.

It will be useful to use the concept of an $ϵ$ -correlated equilibrium. While a correlated equilibrium is where no player can gain any expected utility by strategy modification, an $ϵ$ -correlated equilibrium is where no player can gain more than $ϵ$ expected utility by strategy modification.

Note that the critic's policies correspond to mixtures of strategy modifications; the critic can be seen as jointly picking a player $I$ and a strategy modification $ϕ : A_{I} \to A_{I}$ for the player. Furthermore, the critic's expected utility is half the expected utility gained by the corresponding player for the average strategy modification in this mixture:

$expected critic utility = \frac{1}{2} (E [U_{I} (X) | tails] - E [U_{I} (X) | heads])$

because the critic's expected utility is half the difference between player $I$ 's expected utility given strategy modification ( $tails$ ) and player $I$ 's expected utility given no strategy modification ( $heads$ ). Some facts result:

Suppose the actor chooses $A_{1}, . . ., A_{n}$ from some joint distribution that is an $ϵ$ -correlated equilibrium of the original game. Then the actor's expected utility is at least $- ϵ / 2$ regardless of the critic's policy.
Suppose the actor chooses $A_{1}, . . ., A_{n}$ from some joint distribution that is not an $ϵ$ -correlated equilibrium of the original game. Then the critic's best response results in a utility of no more than $- ϵ / 2$ for the actor.

Correlated equilibria always exist, so at a Nash equilibrium in the zero-sum game, the actor always outputs a correlated equilibrium and gets expected utility 0.

Perhaps in real life, it is inconvenient to observe the state $X$ resulting from actions $A_{1}, . . ., A_{I - 1}, A_{I}^{'}, A_{I + 1}, . . ., A_{n}$ , because we can only observe the state by outputting actions, and maybe we always want to output actions from a correlated equilibrium. In this case we could use counterfactual oversight to usually output $A_{1}, . . ., A_{n}$ , but run the procedure above occasionally to gather training data. It's not clear when it's acceptable to occasionally output strategy-modified action profiles (instead of action profiles from a correlated equilibrium).

AI ALIGNMENT FORUM
AF

7

Anything you can do with n AIs, you can do with two (with directly opposed objectives)

7