AI ALIGNMENT FORUM
AXRP - the AI X-risk Research Podcast
AF

AXRP - the AI X-risk Research Podcast

Transcripts of AXRP episodes.

7AXRP Episode 1 - Adversarial Policies with Adam Gleave

4y

3

7AXRP Episode 2 - Learning Human Biases with Rohin Shah

4y

0

11AXRP Episode 3 - Negotiable Reinforcement Learning with Andrew Critch

4y

0

21AXRP Episode 4 - Risks from Learned Optimization with Evan Hubinger

4y

10

17AXRP Episode 5 - Infra-Bayesianism with Vanessa Kosoy

4y

2

13AXRP Episode 6 - Debate and Imitative Generalization with Beth Barnes

4y

3

23AXRP Episode 7 - Side Effects with Victoria Krakovna

4y

6

14AXRP Episode 8 - Assistance Games with Dylan Hadfield-Menell

4y

1

34AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant

4y

2

20AXRP Episode 10 - AI’s Future and Impacts with Katja Grace

4y

2

13AXRP Episode 11 - Attainable Utility and Power with Alex Turner

4y

5

22AXRP Episode 12 - AI Existential Risk with Paul Christiano

3y

0

15AXRP Episode 13 - First Principles of AGI Safety with Richard Ngo

3y

1

15AXRP Episode 14 - Infra-Bayesian Physicalism with Vanessa Kosoy

3y

10

18AXRP Episode 15 - Natural Abstractions with John Wentworth

3y

0

14AXRP Episode 16 - Preparing for Debate AI with Geoffrey Irving

3y

0

10AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

3y

0

8AXRP Episode 18 - Concept Extrapolation with Stuart Armstrong

3y

0

25AXRP Episode 19 - Mechanistic Interpretability with Neel Nanda

2y

0

11AXRP Episode 20 - ‘Reform’ AI Alignment with Scott Aaronson

2y

0

10AXRP Episode 21 - Interpretability for Engineers with Stephen Casper

2y

1

30AXRP Episode 22 - Shard Theory with Quintin Pope

2y

4

15AXRP Episode 23 - Mechanistic Anomaly Detection with Mark Xu

2y

0

19AXRP Episode 24 - Superalignment with Jan Leike

2y

3

17AXRP Episode 25 - Cooperative AI with Caspar Oesterheld

2y

0

8AXRP Episode 26 - AI Governance with Elizabeth Seger

1y

0

38AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt

1y

6

6AXRP Episode 28 - Suing Labs for AI Risk with Gabriel Weil

1y

0

15AXRP Episode 29 - Science of Deep Learning with Vikrant Varma

1y

1

11AXRP Episode 30 - AI Security with Jeffrey Ladish

1y

0

37AXRP Episode 31 - Singular Learning Theory with Daniel Murfet

1y

0

13AXRP Episode 32 - Understanding Agency with Jan Kulveit

11mo

0

20AXRP Episode 33 - RLHF Problems with Scott Emmons

10mo

0

13AXRP Episode 34 - AI Evaluations with Beth Barnes

9mo

0

13AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

8mo

0

14AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics

7mo

0

11AXRP Episode 37 - Jaime Sevilla on Forecasting AI

7mo

1

9AXRP Episode 38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems

5mo

0

7AXRP Episode 38.1 - Alan Chan on Agent Infrastructure

5mo

0

20AXRP Episode 38.2 - Jesse Hoogland on Singular Learning Theory

5mo

0

25AXRP Episode 39 - Evan Hubinger on Model Organisms of Misalignment

5mo

0

12AXRP Episode 38.3 - Erik Jenner on Learned Look-Ahead

4mo

0

5AXRP Episode 38.4 - Shakeel Hashim on AI Journalism

4mo

0

7AXRP Episode 38.5 - Adrià Garriga-Alonso on Detecting AI Scheming

3mo

0

7AXRP Episode 38.6 - Joel Lehman on Positive Visions of AI

3mo

0

8AXRP Episode 38.7 - Anthony Aguirre on the Future of Life Institute

2mo

0

8AXRP Episode 38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future

2mo

0

13AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability

1mo

0