Openness Norms in AGI Development

Sublation

1. Introduction

This post outlines two models from the social epistemology of science explaining the emergence of the a particular openness norm within the sciences, and then looks at how such models can be utilised to understand research groups trying to develop AGI. In the rest of the introduction, I will try to provide some motivation for this post. Sections 2 & 3 will briefly outline the two models I'm looking at. Section 4 more directly tries to interpret such models in the context of AGI development. Section 5 concludes.

The social epistemology of science is an interdisciplinary subfield at the intersection of philosophy and economics, which utilises formal models to understand the incentive structure of science. Here, I focus on two models from this area which try to explain the emergence of one particular openness norm: the so-called ‘communist norm’ in scientific research. The communist norm is a norm to share all ‘substantive findings’ with the scientific community. The existence of this norm seems to be taken for granted in this literature, although the best piece of evidence I can find for its existence comes from Louis et. al (2001), who find, in a sample of nearly 2,000 geneticists, that 91% agree that one should share all of one's relevant data. I nevertheless take it for granted in this post.

I wanted to see whether understanding the emergence of the communist norm in science could be important for understanding the development of AGI. In many ways, one might think the incentive structures around the development of AGI (will, or does) parallel the incentive structures of academic science. Thus, one might think that looking at the incentive structures behind scientific research are a good starting point for looking at the incentive structures surrounding the development of AGI.

As the communist norm emerged in science, one can imagine the emergence of a similar ‘communist norm’ across research groups involved in AGI development, where research groups share all relevant research with one another. In the context of AGI, one might worry that sharing all research would speed up development alongside safety, potentially leading to a suboptimal scenario where capabilities improve faster than safety. If the incentive structures surrounding the development of AGI are appropriately similar to the incentive structures of science, then the models offered by Heesen and Strevens give different analyses of what we should expect among research groups trying to build AGI.

In the end, I tentatively conclude that Strevens' model is likely to be more useful for modelling AGI development races, and suggest that a social contract where AGI developers agree to share all safety-relevant information is incentive-compatible.

2. Strevens’ Paper: The Communist Norm and The Social Contract

Strevens aims to provide a ‘Hobbesian vindication’ of the communist norm in science, assuming self-interested, credit-maximising scientists. That is, his paper aims to provide the following:

A transformation of the communist norm into a social contract which behaviourally mirrors the norm.
A rationale for signing the social contract.

Strevens’ model includes $n \geq 2$ research programs, each with a discovery density, representing the probability that a given research group makes the discovery. For any research group F, let $F$ be the event that F would make the discovery, with probability given by: $P (F) = \int_{0}^{\infty} f (t) d t$ . A program’s discovery probability is subjunctive -- that is, it is the probability that a scientific research program would make the discovery, given enough time. This is well-defined in the model even if a scientific research program never does, in fact, make the discovery. We say that a program’s power is equal to its discovery probability.

From the pool of potential winners (that is, programs with a non-zero probability of making the discovery), the actual winner is determined by each program’s race-clinching probability. A program's race-clinching probability is the conditional probability that a research program makes the discovery, given that another program also would have made the discovery. Formally, the race-clinching probability in a two-way race between F and G is given by $C (F) =_{d f} P (F^{*} G | F G) = P (F^{*} G) / P (F G)$ , where $F^{*} G$ is the event that, in a two-way race between F and G, F makes the discovery first, and $F G$ is the event that both programs would make the discovery (Strevens calls such events 'live races').

It is assumed that the discovery density of each research program (that is, the probability that it would eventually make the discovery) is stochastically independent (i.e., $P (F G) = P (F) P (G)$ ). Strevens claims that such an assumption is innocent for his purposes, which is to prove that information exchange does not alter race-clinching probabilities. This is because race-clinching probabilities are of the form: $P (F^{*} G) / P (F G)$ , and that, when independence fails to hold (perhaps because of the success of one program is inversely correlated with another), it is likely to have roughly the same effect on the numerator as the denominator, and so will have negligible effect overall.

Research programs can engage in information exchange, which increases a program’s discovery probability by an amount proportional to the amount of information exchanged. The amount of information that a research program has to share is roughly proportional to its power. Strevens goes through the following ways information exchange might work:

(i) Inflation: information exchange could uniformly inflate each program’s discovery density (not necessarily by the same factor), by transforming $f (t)$ to $j f (t)$ , for $j > 1$ .

(ii) Advancement: information exchange could provide each research program with an equal advancement, transforming their discovery densities from $f (t)$ and $g (t)$ to $f (t + j)$ , and $g (t + j)$ respectively.

(iii) Compression/Rarefaction: information exchange could transform each program’s discovery density $f (t)$ by turning into $j f (j t)$ , by the same factor $j$ .

Conditional on our listed assumptions, cases (i) - (iii) can all be shown to have no effect on each program’s race-clinching probabilities. This is true whether the model involves a single time-period, or extended to involve a multiple-stage model of discovery (Strevens 2017, Section 6).

Strevens supplements his formal model with informal argumentation. In his model, Strevens shows that self-interested scientists should see positive value in sharing all information. This is because each program's probability of making the discovery increases through mutual information exchange, while not changing a program's race-clinching probability. Each program faces an increase in its discovery probability, while the probability that no one makes the discovery goes down.

However, the results of his model do not show that sharing all information is uniquely optimal, rather than (for example) a system that asks higher-powered programs to share more information. Strevens anticipates this objection, arguing that there are only a small number of practical contracts which could be ‘implemented at a reasonable cost given the scale’, and claims that: (a) sharing everything is better than sharing nothing, and (b) contracts which are more complicated (e.g., share equal amounts of information) are less likely to be implemented given their practical costs (such as deciding what counts as an appropriate unit of information).

From the conjunction of his formal results and informal reasoning, Strevens thinks he has thereby shown that self-interested scientists have an incentive to sign a contract pledging them to share all information.

3. Heesen’s Paper: Communism and The Incentive to Share

Heesen offers a game-theoretic model of sharing in science, called ‘The Intermediate Results Game’. In this model, an intermediate stage is any stage of a research project which, when completed, allows for a publishable result, and hence to some amount of credit for the scientist.

We assume $n \geq 2$ scientists compete to complete a research project, which has $k$ intermediate stages. Whenever a scientist completes an intermediate stage, they face a choice: publish the result, or keep it to themselves. Publishing creates a benefit for the scientist, by giving them credit for that stage, as well as any preceding stages that were unpublished. The amount of credit for each stage $j$ is given by $c_{j} > 0,$ with total credit for all stages given by $C = \sum_{j = 1}^{k} c_{j} = 1$ . In addition to providing credit for the scientist, the result also benefits the community, as the publication is now part of collective scientific knowledge. Such collective knowledge can help other scientists make discoveries at later stages. If the scientist refrains from publishing, this increases their probability that they will complete the next stage before competitors, and may thereby allowing them to claim credit for more stages later.

Heesen also makes the assumption that the probability it will take scientist $i$ more than $t$ time units takes to complete stage $j$ is $e^{- t λ_{i j}}$ , on the basis of empirical evidence concerning scientists’ productivity, which can be approximately fitted by a Poisson distribution. This justifies the assumption that waiting times are exponential, as this is just equivalent to assuming that scientists’ productivity is a nonstationary Poisson process. The parameter $λ$ is to be interpreted as the speed at which a scientist works; $1 / λ_{i j}$ is the expected time scientist $i$ takes to complete stage $j$ . We also define:

Assumption 1: The speed parameters and the credit rewards have the following property: for every scientist  $i$ , and for each pair of stages  $j < j^{'}$ :  $c_{j} λ_{i j} \geq c_{j^{'}} λ_{i j^{'}}$

That is, either the credit for each stage is proportional to its difficulty, or earlier stages are awarded more credit than later ones relative to their difficulty.

Granting Assumption 1, we get the following two results:

Theorem 1 (Heesen, 2017): In the intermediate results game with  $n \geq 2$  scientists and  $k \geq 1$  stages under perfect information, there is a unique backwards induction solution where scientists share information at all time periods. Moreover, there are no other behaviourally distinct equilibria in pure or mixed strategies.

Theorem 2 (Heesen, 2017): In the intermediate results game  with  $n \geq 2$  scientists and  $k \geq 1$  stages under imperfect information, there is a unique, strict equilibrium, where scientists share information at every information set.

To say that an equilibrium is behaviourally distinct is to say that scientists make different decisions on nodes that are actually reached. Thus, from Heesen’s model we get the following conclusion: scientists have a credit incentive to share intermediate results, even though holding onto results allows one to have an advantage on completing the next research project. The conclusion of this model is therefore (at least in its original context) fairly optimistic: under a wide variety of situations, we do not need additional enforcement mechanisms to encourage scientific sharing, as that is what we should expect by default from rational, credit-maximising scientists.

4. Heesen, Strevens, and AGI Development

We now look at how these models can be used to model AGI development races, and raise questions about the extent to which such models can be used for this purpose.

4.1. Heesen’s Model

In Heesen’s model, the two theorems we looked at made use of the assumption that credit for each stage is proportional to its difficulty, or earlier stages are awarded more credit than later ones relative to their difficulty. I think this assumption is likely to fail in AGI development, where there is likely to be a particularly large benefit in being the team who completes the final step.

The failure of Assumption 1 is most obvious in hard takeoff scenarios, where one final insight leads to a large qualitative leap in intelligence. In this case, the ‘scientist’ (or, likely, research team in our scenario) who finished the last stage would get a huge share of the overall value, by being the research team who creates the AGI. This may be because the final step is more difficult, but there is no guarantee that this will be true. It is also plausible that Assumption 1 would fail in a variety of soft takeoff scenarios. Even under soft takeoff scenarios, it seems plausible to expect most of the value (or 'credit' in the original model) to be captured towards the end of the research project, as capabilities continue improving. That alone would also be enough to falsify Assumption 1, as teams who complete later stages would gain a disproportionate share of overall value, purely because such stages are later (and not because they are more difficult). I thus think there are good reasons to believe that, in the context of AGI development, we would not expect sharing by default.

The failure of Heesen’s model to apply to many cases of interest might be seen as positive, if we were initially worried that something like the communist norm naturally emerging in scenarios where credit was awarded for intermediate results. Obtaining credit for intermediate results seems plausible in the context of developing AGI, where, presumably, one would gain some level of outside prestige by through the contribution of (for example) novel mathematical results along the way to developing AGI. However, the failure of Heesen's model to be useful for (a sufficiently wide variety of) our purposes is likely to mean that, if we want to incentivise the sharing of safety techniques, we cannot expect this to emerge by default. The need to work on proposals to properly incentivise safety is (for most of us) unsurprising.

4.2. Strevens’ Model

Like Heesen’s model, some assumptions only questionably carry over to the case of AGI development: I focus here on the structure Strevens thinks information sharing should take: crucially, under Strevens' model, race-clinching probabilities are changed under information sharing, on the condition that information sharing compresses or advances one program’s discovery density more than another’s. In other words, Strevens' main result (that information sharing leaves race-clinching probabilities unchanged) fails to hold when information sharing unequally speeds up one research program's discovery density relative to another's.

In the context of AGI development, it seems plausible that one research program is likely to gain more from an information exchange than another. However, one might think that, even if this is true, each research program will, ex ante, have no reason to think they are the high-powered research program. Anecdotally, I think this is somewhat implausible. I imagine that successful groups tend to be lead by those who believe that they are more likely to succeed than average, and so a contract which relies on actively not believing that seems unlikely to emerge naturally.

I think the considerations above undercut the case for the communist norm emerging as a game-theoretic equilibrium in the case of AGI development, or as a contract independently drawn up by AGI developers. I do not think this conclusion should bother us too much, as it is (at best) dubious as to whether total information sharing would be socially optimal, given that total information sharing would also be likely to speed up the development of AI capabilities.

However, I don’t think my above criticisms of Strevens' model threaten the attractiveness of a contracted commitment to share all safety-relevant information. This would be a local version of the communist norm, rather than (as is allegedly the case in science) a global version of the communist norm. While some exchanges of ‘safety-relevant information’ might help another team improve their capabilities (hence changing the race-clinching probabilities), it seems unlikely that any research team would believe that sharing only information relevant to safety is likely to alter their ex ante estimates of the relative race-clinching probabilities. Assuming each research group has strictly positive concern for safety, each research team will be incentivised to commit to a safety-specific communist norm.

Strevens' model licenses a further optimistic conclusion concerning the development of AI capabilities relative to safety; if developers signed a contract to share all safety relevant information, then, as sharing information does not (in general) maximise credit, we should be less concerned that developers will, of their own accord, decide to engage in a process of free information sharing about non-safety relevant information. Although there were doubts about other parts of Strevens’ model, the basic setup as a prisoner’s dilemma seems to better capture the case of AGI development than Heesen’s model.

If I am right in my interpretation of Strevens’ model, then there are still possible situations under which research teams would be incentivised to renege on their promises, such as a case where sharing relevant safety information involves something which would improve the capabilities of another team, making it comparatively less likely that they will be the first to develop AGI. The possibility of such cases speaks to the need to look for mechanisms to effectively enforce adherence to a contract committed to sharing all safety-relevant information, or (potentially) further research on robust mechanisms of cooperation.

One possible enforcement mechanism would be to have firms commit to an underlined public statement, as discussed by Christiano. If firms all signed a contract agreeing to have such a public statement (including statements about commitments to share all relevant safety information), then this might provide a way of easily enforcing commitment without the need for resource-intensive outside policing.

5. Conclusions and Further Caveats

Looking at these two models has given me some insight into AGI development races. In particular, I feel like I am now more confident in the following claims:

An unrestricted version of the communist norm is unlikely to emerge endogenously in the course of AGI development.
A contract whereby all AGI developers agree to sign a pledge to share all safety relevant information is incentive compatible.

I am unsure whether such claims were obvious to those more familiar with literature on macrostrategy. I would thus be thankful to commenters who can point out examples of these claims in the existing macrostrategy literature. Moreover, I should stress that, although such updates are positive, such views are still reasonably tentative.

I end with a caveat. This piece has discussed a variant of the ‘communist norm’ applied to the development of AGI. I have suggested that the emergence of something like the communist norm in science (that is, the existence of a norm to share all relevant data) could be bad, but have mostly uncritically suggested that we could implement a safety-specific communist norm, where competing developers have to share all information relevant to safety research. I don't think I'm saying anything too controversial here: as Bostrom (2017) says, ‘openness about values, goals, and governance structures is generally welcome’.

That said, I feel compelled to mention that there is at least some tentative evidence that, among some network structures, less information can be better. For instance, Zollman (2007, 2010) finds that, among rational Bayesian agents who communicate with each other on a network structure, sparsely connected networks can sometimes epistemically outperform more densely connected networks. While I am not convinced that such results are sufficiently robust to changes in parameter values (see, for example, (Rosenstock et. al 2017)), a review of related results would be useful, before committing to a contract to share all safety-relevant information.

[-]Rohin Shah5y30

Planned summary for the Alignment Newsletter:

This post summarizes two papers that provide models of why scientific research tends to be so open, and then applies it to the development of powerful AI systems. The first models science as a series of discoveries, in which the first academic group to reach a discovery gets all the credit for it. It shows that for a few different models of info-sharing, info-sharing helps everyone reach the discovery sooner, but doesn't change the probabilities for who makes the discovery first (called _race-clinching probabilities_): as a result, sharing all information is a better strategy than sharing none (and is easier to coordinate on than the possibly-better strategy of sharing just some information).

However, this theorem doesn't apply when info sharing compresses the discovery probabilities _unequally_ across actors: in this case, the race-clinching probabilities _do_ change, and the group whose probability would go down is instead incentivized to keep information secret (which then causes everyone else to keep their information secret). This could be good news: it suggests that actors are incentivized to share safety research (which probably doesn't affect race-clinching probabilities) while keeping capabilities research secret (thereby leading to longer timelines).

The second paper assumes that scientists are competing to complete a k-stage project, and whenever they publish, they get credit for all the stages they completed that were not yet published by anyone else. It also assumes that earlier stages have a higher credit-to-difficulty ratio (where difficulty can be different across scientists). It finds that under this setting scientists are incentivized to publish whenever possible. For AI development, this seems not to be too relevant: we should expect that with powerful AI systems, most of the "credit" (profit) comes from the last few stages, where it is possible to deploy the AI system to earn money.

Planned opinion:

I enjoyed this post a lot; the question of openness in AI research is an important one, that depends both on the scientific community and industry practice. The scientific community is extremely open, and the second paper especially seems to capture well the reason why. In contrast industry is often more secret (plausibly due to <@patents@>(@Who owns artificial intelligence? A preliminary analysis of corporate intellectual property strategies and why they matter@)). To the extent that we would like to change one community in the direction of the other, a good first step is to understand their incentives so that we can try to then change those incentives.

AI ALIGNMENT FORUM
AF

8