AI ALIGNMENT FORUM
AF

Paul Colognese

Personal website

Posts

Sorted by New

2Paul Colognese's Shortform

2y

0

29High-level interpretability: detecting an AI's objectives

2y

0

16Aligned AI via monitoring objectives in AutoGPT-like systems

2y

0

34Decision Transformer Interpretability

2y

6

18Auditing games for high-level interpretability

2y

0

15Deception?! I ain’t got time for that!

3y

0

Wikitag Contributions

Comments

Sorted by