AI ALIGNMENT FORUM
AF

Nicholas Schiefer

Posts

Sorted by New

118Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

1y

69

122Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research

2y

14

37Engineering Monosemanticity in Toy Models

2y

4

6ELK Proposal - Make the Reporter care about the Predictor’s beliefs

3y

0

Wikitag Contributions

Comments

Sorted by