This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Verification
•
Applied to
Compact Proofs of Model Performance via Mechanistic Interpretability
by
Jason Gross
6mo
ago
•
Applied to
Formal verification, heuristic explanations and surprise accounting
by
Mo Putera
6mo
ago
•
Applied to
Alignment with argument-networks and assessment-predictions
by
Tor Økland Barstad
2y
ago
•
Applied to
Making it harder for an AGI to "trick" us, with STVs
by
Tor Økland Barstad
2y
ago
•
Created by
Tor Økland Barstad
at
2y