The Solomonoff prior is a mathematical formalization of Occam's razor. It's intended to provide a way to assign probabilities to observations based on their simplicity. However, the simplest programs that predict observations well might be universes containing intelligent agents trying to influence the predictions. This makes the Solomonoff prior "malign" - its predictions are influenced by the preferences of simulated beings.
We often talk about ensuring control, which in the context of this doc refers to preventing AIs from being able to cause existential problems, even if the AIs attempt to subvert our countermeasures. To better contextualize control, I think it's useful to discuss the main threats. I'll focus my discussion on threats induced by misalignment which could plausibly increase existential risk.
While control is often associated with preventing security failures, I'll discuss other issues like sabotaging safety-critical work. I'll list and explain my prioritization of threats. As part of this, I'll argue that rogue internal deployments—cases where the AI is running within the company's datacenter but with control measures disabled—can be substantially worse than self-exfiltration—the AI stealing its weights and running them on an external server—as it might...
Good point, this is an error in my post.
As AI systems get more capable, it becomes increasingly uncompetitive and infeasible to avoid deferring to AIs on increasingly many decisions. Further, once systems are sufficiently capable, control becomes infeasible. [1] Thus, one of the main strategies for handling AI risk is fully (or almost fully) deferring to AIs on managing these risks. Broadly speaking, when I say "deferring to AIs" [2] I mean having these AIs do virtually all of the work to develop more capable and aligned successor AIs, managing exogenous risks, and making strategic decisions. [3] If we plan to defer to AIs, I think it's safest...
you may already be familiar with my ideas
Yep, I was thinking about outputs from you (in addition to outputs from others) while writing this.
Given that human experts seem to have systematic biases (e.g., away from ensembling), wouldn't the ensemble also exhibit such biases? Have you taken this into account when saying "My guess is that the answer is yes"?
Yes, I'm including this sort of consideration when I say top human experts do would probably be sufficient.
Your point around companies not wanting to ensemble and wanting to use their own epistemic...
Not sure if this is already well known around here, but apparently AI companies are heavily subsidizing their subscription plans if you use their own IDEs/CLIs. (It's discussed in various places but I had to search for it.)
I realized this after trying Amp Code. They give out a $10 daily free credit, which can easily be used up in 1 or 2 prompts, e.g., "review this code base, fix any issues found". (They claim to pass their API costs to their customers with no markup, so this seems like a good proxy for actual API costs.) But with even a $19.99 subscription...
In this post, I describe a simple model for forecasting when AI will automate AI development. It is based on the AI Futures model, but more understandable and robust, and has deliberately conservative assumptions.
At current rates of compute growth and algorithmic progress, this model's median prediction is >99% automation of AI R&D in late 2032. Most simulations result in a 1000x to 10,000,000x increase in AI efficiency and 300x-3000x research output by 2035. I therefore suspect that existing trends in compute growth and automation will still produce extremely powerful AI on "medium" timelines, even if the full coding automation and superhuman research taste that drive the AIFM's "fast" timelines (superintelligence by ~mid-2031) don't happen.
I am skeptical that uplift measurements actually give much tighter confidence intervals.
I think we might already have evidence against longer timelines (2049 timelines from doubling difficulty growth factor of 1.0). Basically, the model needs to pass a backtest against where uplift was in 2025 vs now, and if there's significant uplift now and wasn't before 2025, this implies the automation curve is steep.
Suppose uplift at the start of 2026 was 1.6x as in the AIFM's median (which would be 37.5% automation if you make my simplifying assumptions). If we also ...
In a recent post, Cole Wyeth makes a bold claim:
. . . there is one crucial test (yes this is a crux) that LLMs have not passed. They have never done anything important.
They haven't proven any theorems that anyone cares about. They haven't written anything that anyone will want to read in ten years (or even one year). Despite apparently memorizing more information than any human could ever dream of, they have made precisely zero novel connections or insights in any area of science[3].
...An anecdote I heard through the grapevine: some chemist was trying to synthesize some chemical. He couldn't get some step to work, and tried for a while to find solutions on the internet. He eventually asked an LLM. The LLM gave a very
I do, and I’m shifting in that direction.