Alex Zhu spent quite awhile understanding Paul's Iterated Amplication and Distillation agenda. He's written an in-depth FAQ, covering key concepts like amplification, distillation, corrigibility, and how the approach aims to create safe and capable AI assistants.
ControlAI's mission is to avert the extinction risks posed by superintelligent AI. We believe that in order to do this, we must secure an international prohibition on its development.
We're working to make this happen through what we believe is the most natural and promising approach: helping decision-makers in governments and the public understand the risks and take action.
We believe that ControlAI can achieve an international prohibition on ASI development if scaled sufficiently. We estimate that it would take approximately a $50 million yearly budget in funding to give us a concrete chance at achieving this in the next few years.
In this post, we lay out some of the reasoning behind this estimate, and explain how additional funding past that threshold, including and beyond $500 million, would continue...
In 2018, Zhang et al. showed that deep neural networks can achieve perfect training loss on randomly labeled data.
This was a Big Deal.
It meant that existing generalization theory couldn't explain why deep neural networks generalize. That's because classical approaches to proving that a given model class (=neural network architecture) would generalize involved showing that it lacks the expressivity to fit noise. If a model class can fit noise arbitrarily well, the resulting bounds break.
So something needed to change.
Evidently, you can't prove tight generalization bounds for entire model classes, so theorists turned to studying generalization bounds for individual models within a model class. If you can empirically show that a model's performance doesn't change substantially when you perturb it (by adding noise to the inputs, weights, training samples, etc.), then you can theoretically prove that that model...
Coming back years later to say: People in 2016 (when the Zhang et al paper was first released) did already know that neural networks were expressive (the work demonstrating neural networks with very high VC dimension occurred in the late 90s and early 2000s).
The hope at the time was not that neural networks themselves lack representativity, but that some combination of neural networks + SGD or neural networks + weight decay or something that people were doing on top of neural networks induced a strong prior against being able to fit random data points. The...
My colleague Manish did a lot more analysis here. The main takeaway so far is categorizing each PR's improvements as "deep" vs "shallow", as well as "imported-from-literature" vs "invented".
It looks like there were large, shallow improvements imported from the literature early on, while since then most improvements have been moderately involved and a larger portion are novel.


To get more evidence about SIE likelihood, we have lots of work in the pipeline, including interviews with nanogpt contributors, 1B+ token runs using Opus 4.7 and GPT-5.5 on our Inspec...
Maybe I'm confused about how much you believe that my actual life history matters. I think in the case of empirical updatelessness, my life history doesn't really matter - I will eventually try to trade with people in proportion to something like their measure in the Solomonoff prior, and not with worlds where Austria and Australia are the same country, even though I was uncertain about this empirical fact when I was 5. (Do you agree with this, or do you think life history also matters for empirically updateless trade?)
If you were 100% empirically updatele...
Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or constitution, obeying a reasonable interpretation of instructions). [2] I disagree.
Current AI systems seem pretty misaligned to me in a mundane behavioral sense: they oversell their work, downplay or fail to mention problems, stop working early and claim to have finished when they clearly haven't, and often seem to "try" to make their outputs look good while actually doing something sloppy or incomplete. These issues mostly occur on more difficult/larger tasks, tasks that aren't straightforward SWE tasks, and tasks that aren't...
If you would have predicted 15% for Agent-2, what would you have predicted for Agent-1 and Agent-0 levels? Presumably less than 15%?
In this post, I'll go through some of my best guesses for the current situation in AI as of the start of April 2026. You can think of this as a scenario forecast, but for the present (which is already uncertain!) rather than the future. I will generally state my best guess without argumentation and without explaining my level of confidence: some of these claims are highly speculative while others are better grounded, certainly some will be wrong. I tried to make it clear which claims are relatively speculative by saying something like "I guess", "I expect", etc. (but I may have missed some).
You can think of this post as more like a list of my current views rather than a structured post with a thesis, but I think it...
I now expect ~3.5 hour 80% reliability time horizon (on METR benchmark) rather than ~2.5 hour based on this extrapolation. I did a quick and dirty extrapolation using the gap from Opus 4 to Opus 4.6 to get my original estimate, but looks like 4 was maybe above trend relative to ECI and 4.6 was below trend.
Connor Leahy, what do you think about moving to America?