This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Wikitags
AF
Login
Alignment Research Center (ARC)
Settings
Dakara
v1.6.0
Dec 30th 2024 GMT
(-4)
LW
1
skunnavakkam
v1.5.0
Dec 21st 2024 GMT
LW
1
skunnavakkam
v1.4.0
Dec 21st 2024 GMT
(-13)
LW
1
Applied to
Concrete Methods for Heuristic Estimation on Neural Networks
by
Oliver Daniels
4mo
ago
changed name from Alignment Research Center to Alignment Research Center (ARC)
Raymond Arnold
v1.3.0
Oct 23rd 2024 GMT
LW
2
Applied to
A bird's eye view of ARC's research
by
Raymond Arnold
5mo
ago
Applied to
Low Probability Estimation in Language Models
by
Ruben Bloom
5mo
ago
Applied to
Estimating Tail Risk in Neural Networks
by
Mark Xu
6mo
ago
Applied to
Why is there an alignment problem?
by
InfiniteLight
1y
ago
Applied to
Paul Christiano on Dwarkesh Podcast
by
ESRogs
1y
ago
Applied to
ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
by
Magdalena Wache
2y
ago
Applied to
AXRP Episode 23 - Mechanistic Anomaly Detection with Mark Xu
by
DanielFilan
2y
ago
Applied to
Why "AI alignment" would better be renamed into "Artificial Intention research"
by
Daniel Böttger
2y
ago
Applied to
ARC is hiring theoretical researchers
by
Zack M. Davis
2y
ago
Applied to
The Goal Misgeneralization Problem
by
Madhusudhan Pathak
2y
ago
Applied to
More information about the dangerous capability evaluations we did with GPT-4 and Claude.
by
Lawrence Chan
2y
ago
Applied to
Prizes for matrix completion problems
by
Mark Xu
2y
ago
Mark Xu
v1.2.0
May 1st 2023 GMT
(
+18
/
-4
)
LW
2