Scenario Forecasting Workshop: Materials and Learnings

elifland; charlie_griffin

Disclaimer: While some participants and organizers of this exercise work in industry, no proprietary info was used to inform these scenarios, and they represent the views of their individual authors alone.

Overview

In the vein of What 2026 Looks Like and AI Timelines discussion, we recently hosted a scenario forecasting workshop. Participants first wrote a 5-stage scenario forecasting what will happen between now and ASI. Then, they reviewed, discussed, and revised scenarios in groups of 3. The discussion was guided by forecasts like “If I were to observe this person’s scenario through stage X, what would my ASI timelines median be?”.

Instructions for running the workshop including notes on what we would do differently are available here. We’ve put 6 shared scenarios from our workshop in a publicly viewable folder here.

Edit: Here is the template document for a simplified version of this workshop, which we ran at The Curve in late 2024.

Motivation

Writing scenarios may help to:

Clarify views, e.g. by realizing an abstract view is hard to concretize, or realizing that two views you hold don’t seem very compatible.
Surface new considerations, e.g. realizing a subquestion is more important than you thought, or that an actor might behave in a way you hadn’t considered.
Communicate views to others, e.g. clarifying what you mean by “AGI”, “slow takeoff”, or the singularity.
Register qualitative forecasts, which can then be compared against reality. This has advantages and disadvantages vs. more resolvable forecasts (though scenarios can include some resolvable forecasts as well!).

Running the workshop

Materials and instructions for running the workshop including notes on what we would do differently are available here.

The schedule for the workshop looked like:

Session 1 involved writing a 5-staged scenario forecasting what will happen between now and ASI.

Session 2 involved reviewing, discussing, and revising scenarios in groups of 3. The discussion was guided by forecasts like “If I were to observe this person’s scenario through stage X, what would my ASI timelines median be?”. There were analogous questions for p(disempowerment) and p(good future).

Session 3 was freeform discussion and revision within groups, then there was a brief session for feedback.

Workshop outputs and learnings

6 people (3 anonymous, 3 named) have agreed to share their scenarios. We’ve put them in a publicly viewable folder here.

We received overall positive feedback, with nearly all 23 people who filled out the feedback survey saying it was a good use of their time. In general, people found the writing portion more valuable than the discussion. We’ve included some ideas on how to improve future similar workshops based on this feedback and a few other pieces of feedback in our instructions for organizers. It’s possible that a workshop that is much more focused on the writing relative to the discussion would be more valuable.

Speaking for myself (as Eli), I think it was mostly valuable as a forcing function to get people to do an activity they had wanted to do anyway. And scenario writing seems like a good thing for people to spend marginal time on (especially if they find it fun/energizing). It seems worthwhile to experiment with the format (in the ways we suggest above, or other ways people are excited about). It feels like there might be something nearby that is substantially more valuable than our initial pilot.

[-]Nikola Jurkovic2y20

Romeo Dean and I ran a slightly modified version of this format for members of AISST and we found it a very useful and enjoyable activity!

We first gathered to do 2 hours of reading and discussing, and then we spent 4 hours switching between silent writing and discussing in small groups.

The main changes we made are:

We removed the part where people estimate probabilities of ASI and doom happening by the end of each other’s scenarios.
We added a formal benchmark forecasting part for 7 benchmarks using private Metaculus questions (forecasting values at Jan 31 2025):
1. GPQA
2. SWE-bench
3. GAIA
4. InterCode (Bash)
5. WebArena
6. Number of METR tasks completed
7. ELO on LMSys arena relative to GPT-4-1106

We think the first change made it better, but in hindsight we would have reduced the number of benchmarks to around 3 (GPQA, SWE-bench and LMSys ELO), or given participants much more time.

AI ALIGNMENT FORUM
AF