Review
- There should be two thresholds on compute graph size:
- the Frontier threshold, beyond which oversight during execution is mandatory
- the Horizon threshold, beyond which execution is forbidden by default
- Oversight during execution:
- should be carried out by state and/or international inspectors who specialize in evaluating frontier training runs
- Individuals who are employed as such inspectors should not have any past or present conflict of interest with the organization whose runs they evaluate.
- However, it is beneficial if these individuals have pertinent experience and knowledge of frontier AI.
- should include, but not be limited to, “dangerous capabilities evaluations” at various points during training
- should be allocated a fixed fraction of the total compute budget for the training run
- should be carried out by state and/or international inspectors who specialize in evaluating frontier training runs
- Inspectors should be empowered to pause a training run at any time if they see evidence that the model is becoming dangerous relative to the safety precautions being taken.
- There should be due process for the organization executing the training run to appeal for permission to continue.
- Levels of safety precautions should be referenced to a (mostly not-yet-written) body of standards for cybersecurity (e.g. of the model weights), formal verification, determinism, etc. in the AI training run context.
- The two compute-graph size thresholds should be changed:
- gradually upwards, on an explicit schedule that provides for a slight increase every calendar day.
- This avoids the potential shock of sudden increases in capabilities at the end of a “pause”.
- downwards in response to new discoveries that impact compute-efficiency, via a very rapid but nonetheless legalistically formalized process that requires a vote from a neutral international board of experts.
- gradually upwards, on an explicit schedule that provides for a slight increase every calendar day.
- Execution of compute graphs exceeding the Horizon threshold may be permitted,
- if the case for the adequacy of their safety mechanisms is convincing to a supermajority of the expert board,
- and with unanimous assent of the duly appointed representatives of the international parties to these rules.
- Compute graph size should be measured:
- accounting for all computations in the causal history of the proposed execution back to before September 2021,
- including, in particular:
- Base models from which fine-tuning is taking place
- Models which may have been used to generate some of the training data
- For the avoidance of doubt, this accounting should recursively aggregate transitive inputs.
- Safe harbor: if all constants, parameters, and variables in the compute graph are either initialized to random Gaussians, or to concatenations (in a random order) of byte-sequences which can be proven to have existed before September 2021, then no further accounting for inputs is necessary.
- including, in particular:
- in “standard FLOPs”, which are defined using a standardized set of coefficients,
- For example,
- 1 NF4 multiplication = 0.7 standard FLOP
- 1 FP8 multiplication = 1.0 standard FLOP
- 1 FP16 multiplication = 1.2 standard FLOP
- 1 FP32 multiplication = 1.3 standard FLOP
- 1 FP64 multiplication = 1.33 standard FLOP
- These coefficients are negotiable, but should ideally be based on some evidence, such as comparative analyses of perplexity scores achievable with training runs that vary only in numerical format.
- For example,
- accounting for all computations in the causal history of the proposed execution back to before September 2021,
- Nothing in this proposal should be construed as minimizing or dismissing the misuse risks of models below the Frontier threshold.
- However, it is suggested that evaluations of such models can safely take place after a training run has been executed, as long as the model outputs are not yet put in contact with humans or with vulnerable cyberinfrastructure.
- Requirements for the following are out of scope of this proposal, and left for future work:
- licensing (of developers, users, inspectors, etc.)
- registration (of training plans, deployment plans, etc.)
- tracking and tracing (of high-throughput training hardware, e.g. H100s and H800s)
- verification (of compute graph size, especially below-but-near the Frontier threshold, using cryptographic proofs)