A corollary of Sutton's Bitter Lesson is that solutions to AI safety should scale with compute.[1]
Let's consider a few examples of research directions that are aiming at this property:
[O]ur proposed method has the advantage that, with more and more compute, it converges to the correct prediction . . . . In other words, more computation means better and more trustworthy answers[.] — Bengio et al. (2025)
In ARC's current thinking, the plan is not to have a clear line of separation between things we need to explain and things we don't. Instead, the loss function of explanation quality should capture how important various things are to explain, and the explanation-finding algorithm is given a certain compute budget to build up the explanation of the model behavior by bits and pieces. — Matolcsi (2025)
For these procedures to work, we want the ideal limits of these procedures to be safe. In a recent talk, Irving suggests we should understand the role of theory as being about analyzing these limiting properties (assuming sufficient resources like compute, data, and potentially simplifying assumptions) and the role of empirics as being about checking if the conditions required by the theory seem to hold in practice (e.g., has learning converged? Are the assumptions reasonable?).[3]
From this perspective, the role of theory is to answer questions like:
Taking a step back, obtaining understanding and guarantees about idealized limits has always been the aim of theoretical approaches to AI safety:
The main update is that now we roughly know the kinds of systems that AGI might consist of, and this enables a different kind of theory strategy: instead of studying the ideals that emerge from a list of desiderata, you can study the limiting/convergence properties of the protocols we are currently using to develop and align AGI.
In practice, we probably need both approaches. As we continue to scale up compute, data, model size, and the RL feedback loop, we might find ourselves suddenly very close to the limits, wishing we had known better what lay in store.
Ren et al. (2024) make a related point, which is that AI safety research should aim at problems that aren't already solved by scaling. My point is about what solutions to those problems should look like.
Bengio's proposal involves a very particular technical vision for using G-Flow Nets to implement an (approximate, amortized) Bayesian Oracle that he believes will scale competitively (compared to the inference costs of the agent being monitored).
Irving's talk is about scalable oversight, but the general framing applies much more generally.