Promoted to curated: I really liked this post for its combination of reporting negative results, communicating a deeper shift in response to those negative results, while seeming pretty well-calibrated about the extent of the update. I would have already been excited about curating this post without the latter, but it felt like an additional good reason.

Reframing AI Safety as a Neverending Institutional Challenge

Oliver Habryka24d40

No worries!

You did say it would be premised on either "inevitable or desirable for normal institutions to be eventually lose control". In some sense I do think this is "inevitable" but only in the same sense as past "normal human institutions" lost control.

We now have the internet and widespread democracy so almost all governmental institutions needed to change how they operate. Future technological change will force similar changes. But I don't put any value in the literal existence of our existing institutions, what I care about is whether our institutions are going to make good governance decisions. I am saying that the development of systems much smarter than current humans will change those institutions, very likely within the next few decades, making most concerns about present institutional challenges obsolete.

Of course something that one might call "institutional challenges" will remain, but I do think there really will be a lot of buck-passing that will happen from the perspective of present day humans. We do really have a crunch time of a few decades on our hands, after which we will no longer have much influence over the outcome.

Reframing AI Safety as a Neverending Institutional Challenge

Oliver Habryka24d30

I don't think I understand. It's not about human institutions losing control "to a small regime". It's just about most coordination problems being things you can solve by being smarter. You can do that in high-integrity ways, probably much higher integrity and with less harmful effects than how we've historically overcome coordination problems. I de-facto don't expect things to go this way, but my opinions here are not at all premised on it being desirable for humanity to lose control?

Reframing AI Safety as a Neverending Institutional Challenge

Oliver Habryka25d*1214

This IMO doesn't really make any sense. If we get powerful AI, and we can either control it, or ideally align it, then the gameboard for both global coordination and building institutions completely changes (and of course if we fail to control or align it, the gameboard is also flipped, but in a way that removes us completely from the picture).

Does anyone really think that by the time you have systems vastly more competent than humans, that we will still face the same coordination problems and institutional difficulties as we have right now?

It does really look like there will be a highly pivotal period of at most a few decades. There is a small chance humanity decides to very drastically slow down AI development for centuries, but that seems pretty unlikely, and also not clearly beneficial. That means it's not a neverending institutional challenge, it's a challenge that lasts a few decades at most, during which humanity will be handing off control to some kind of cognitive successor which is very unlikely to face the same kinds of institutional challenges as we are facing today.

That handoff is not purely a technical problem, but a lot of it will be. At the end of the day, whether your successor AI systems/AI-augmented-civilization/uplifted-humanity/intelligence-enhanced-population will be aligned with our preferences over the future has a lot of highly technical components.

Yes, there will be a lot of social problems, but the size and complexity of the problems are finite, at least from our perspective. It does appear that humanity is at the cusp of unlocking vast intelligence, and after you do that, you really don't care very much about the weird institutional challenges that humanity is currently facing, most of which can clearly be overcome by being smarter and more competent.

On the Rationality of Deterring ASI

Oliver Habryka1mo2927

Promoted to curated: I have various pretty substantial critiques of this work, but I do overall think this is a pretty great effort at crossing the inferential distance from people who think AGI will be a huge deal and potentially dangerous, to the US government and national security apparatus.

The thing that I feel most unhappy about is that the document feels to me like it follows a pattern that Situational Awareness also had, where it seemed to me like it kept framing various things that it wanted to happen, as "inevitable to happen", while also arguing that they are a good idea, in a way that felt to me like it tried too hard to make some kind of self-fulfilling prophecy.

But overall, I feel like this document speaks with surprising candor and clarity about many things that have been left unsaid in many circumstances. I particularly appreciated its coverage of explicitly including conventional ballistic escalation as part of a sabotage strategy for datacenters. Relevant quotes:

Should these measures falter, some leaders may contemplate kinetic attacks on datacenters, arguing that allowing one actor to risk dominating or destroying the world are graver dangers, though kinetic attacks are likely unnecessary. Finally, under dire circumstances, states may resort to broader hostilities by climbing up existing escalation ladders or threatening non-AI assets. We refer to attacks against rival AI projects as "maiming attacks."

I also particularly appreciated this proposed policy for how to handle AIs capable of recursive self-improvement:

In the near term, geopolitical events may prevent attempts at an intelligence recursion. Looking further ahead, if humanity chooses to attempt an intelligence recursion, it should happen in a controlled environment with extensive preparation and oversight—not under extreme competitive pressure that induces a high risk tolerance.

METR: Measuring AI Ability to Complete Long Tasks

Oliver Habryka1mo1327

Research engineers I talk to already report >3x speedups from AI assistants

Huh, I would be extremely surprised by this number. I program most days, in domains where AI assistance is particularly useful (frontend programming with relatively high churn), and I am definitely not anywhere near 3x total speedup. Maybe a 1.5x, maybe a 2x on good weeks, but definitely not a 3x. A >3x in any domain would be surprising, and my guess is generalization for research engineer code (as opposed to churn-heavy frontend development) is less.

How AI Takeover Might Happen in 2 Years

Oliver Habryka2mo31

Promoted to curated: I think concrete specific scenarios for how things might go with AI are IMO among the most helpful tools to help people start forming their own models about how this whole AI thing might go. Being specific is good, grounding things in concrete observable consequences is good. Somewhat sticking your neck out and making public predictions is good.

This is among the best entries I've seen in this genre, and I hope there will be more. Thank you for writing it!

How might we safely pass the buck to AI?

Oliver Habryka2mo65

Seems good!

FWIW, at least in my mind this is in some sense approximately the only and central core of the alignment problem, and so having it left unaddressed feels confusing. It feels a bit like making a post about how to make a nuclear reactor where you happen to not say anything about how to prevent the uranium from going critical, but you did spend a lot of words about the how to make the cooling towers and the color of the bikeshed next door and how to translate the hot steam into energy.

Like, it's fine, and I think it's not crazy to think there are other hard parts, but it felt quite confusing to me.

How might we safely pass the buck to AI?

Oliver Habryka2mo1011

To the extent the tool just gets gamed, you can iterate until you find detection tools that are more robust (or find ways of training against detection tools that don't game them so hard).

How do you iterate? You mostly won't know whether you just trained away your signal, or actually made progress. The inability to iterate is kind of the whole central difficulty of this problem.

(To be clear, I do think there are some methods of iteration, but it's a very tricky kind of iteration where you need constant paranoia about whether you are fooling yourself, and that makes it very different from other kinds of scientific iteration)

How might we safely pass the buck to AI?

Oliver Habryka2mo33

Ajeya gave 15% to AGI before 2036, with little of that in the first few years after her report; maybe she'd have said 10% between 2025 and 2036.

Just because I was curious, here is the most relevant chart from the report:

This is not a direct probability estimate (since it's about probability of affordability), but it's probably within a factor of 2. Looks like the estimate by 2030 was 7.72% and the estimate by 2036 is 17.36%.