AI ALIGNMENT FORUM
AF

Maxime Riché

Sequences

Evaluating the Existence Neutrality Hypothesis - Introductory Series

Posts

Sorted by New

1Maxime Riché's Shortform

10mo

Wikitag Contributions

Sycophancy

Comments

Sorted by

Newest

We need a Science of Evals

Maxime Riché1y30

FYI, the "Evaluating Alignment Evaluations" project of the current AI Safety Camp is working on studying and characterizing alignment(propensity) evaluations. We hope to contribute to the science of evals, and we will contact you next month. (Somewhat deprecated project proposal)