Compute Thresholds: proposed rules to mitigate risk of a “lab leak” accident during AI training runs

davidad

There should be two thresholds on compute graph size:
1. the Frontier threshold, beyond which oversight during execution is mandatory
2. the Horizon threshold, beyond which execution is forbidden by default
Oversight during execution:
1. should be carried out by state and/or international inspectors who specialize in evaluating frontier training runs
  1. Individuals who are employed as such inspectors should not have any past or present conflict of interest with the organization whose runs they evaluate.
  2. However, it is beneficial if these individuals have pertinent experience and knowledge of frontier AI.
2. should include, but not be limited to, “dangerous capabilities evaluations” at various points during training
3. should be allocated a fixed fraction of the total compute budget for the training run
Inspectors should be empowered to pause a training run at any time if they see evidence that the model is becoming dangerous relative to the safety precautions being taken.
1. There should be due process for the organization executing the training run to appeal for permission to continue.
2. Levels of safety precautions should be referenced to a (mostly not-yet-written) body of standards for cybersecurity (e.g. of the model weights), formal verification, determinism, etc. in the AI training run context.
The two compute-graph size thresholds should be changed:
1. gradually upwards, on an explicit schedule that provides for a slight increase every calendar day.
  1. This avoids the potential shock of sudden increases in capabilities at the end of a “pause”.
2. downwards in response to new discoveries that impact compute-efficiency, via a very rapid but nonetheless legalistically formalized process that requires a vote from a neutral international board of experts.
Execution of compute graphs exceeding the Horizon threshold may be permitted,
1. if the case for the adequacy of their safety mechanisms is convincing to a supermajority of the expert board,
2. and with unanimous assent of the duly appointed representatives of the international parties to these rules.
Compute graph size should be measured:
1. accounting for all computations in the causal history of the proposed execution back to before September 2021,
  1. including, in particular:
    1. Base models from which fine-tuning is taking place
    2. Models which may have been used to generate some of the training data
  2. For the avoidance of doubt, this accounting should recursively aggregate transitive inputs.
  3. Safe harbor: if all constants, parameters, and variables in the compute graph are either initialized to random Gaussians, or to concatenations (in a random order) of byte-sequences which can be proven to have existed before September 2021, then no further accounting for inputs is necessary.
2. in “standard FLOPs”, which are defined using a standardized set of coefficients,
  1. For example,
    1. 1 NF4 multiplication = 0.7 standard FLOP
    2. 1 FP8 multiplication = 1.0 standard FLOP
    3. 1 FP16 multiplication = 1.2 standard FLOP
    4. 1 FP32 multiplication = 1.3 standard FLOP
    5. 1 FP64 multiplication = 1.33 standard FLOP
  2. These coefficients are negotiable, but should ideally be based on some evidence, such as comparative analyses of perplexity scores achievable with training runs that vary only in numerical format.
Nothing in this proposal should be construed as minimizing or dismissing the misuse risks of models below the Frontier threshold.
1. However, it is suggested that evaluations of such models can safely take place after a training run has been executed, as long as the model outputs are not yet put in contact with humans or with vulnerable cyberinfrastructure.
Requirements for the following are out of scope of this proposal, and left for future work:
1. licensing (of developers, users, inspectors, etc.)
2. registration (of training plans, deployment plans, etc.)
3. tracking and tracing (of high-throughput training hardware, e.g. H100s and H800s)
4. verification (of compute graph size, especially below-but-near the Frontier threshold, using cryptographic proofs)

For the avoidance of doubt, this accounting should recursively aggregate transitive inputs.

What does this mean?

Suppose Training Run Z is a finetune of Model Y, and Model Y was the output of Training Run Y, which was already a finetune of Foundation Model X produced by Training Run X (all of which happened after September 2021). This is saying that not only Training Run Y (i.e. the compute used to produce one of the inputs to Training Run Z), but also Training Run X (a “recursive” or “transitive” dependency), count additively against the size limit for Training Run Z.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

26

Compute Thresholds: proposed rules to mitigate risk of a “lab leak” accident during AI training runs

26