AI ALIGNMENT FORUM
Books of LessWrong
AF

A Moderate Update to your Artificial Priors

95ARC's first technical report: Eliciting Latent Knowledge

Paul Christiano, Mark Xu, Ajeya Cotra

3y

72

70Fun with +12 OOMs of Compute

Daniel Kokotajlo

4y

45

111What 2026 looks like

Daniel Kokotajlo

3y

29

84Ngo and Yudkowsky on alignment difficulty

Eliezer Yudkowsky, Richard Ngo

3y

53

75Another (outer) alignment failure story

Paul Christiano

4y

25

93What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

4y

49

3y

19

54Finite Factored Sets

Scott Garrabrant

4y

70

49Selection Theorems: A Program For Understanding Agents

3y

24

72My research methodology

Paul Christiano

4y

35

61larger language models may disappoint you [or, an eternally unfinished draft]

3y

7

57Comments on Carlsmith's “Is power-seeking AI an existential risk?”

3y

11

64EfficientZero: How It Works

3y

2

30Specializing in Problems We Don't Understand

4y

0