AI ALIGNMENT FORUM
Wikitags
AF

Alignment Research Center (ARC)

Settings

Dakara v1.6.0Dec 30th 2024 GMT (-4) LW1

skunnavakkam v1.5.0Dec 21st 2024 GMT LW1

skunnavakkam v1.4.0Dec 21st 2024 GMT (-13) LW1

Applied to Concrete Methods for Heuristic Estimation on Neural Networks by Oliver Daniels 4mo ago

changed name from Alignment Research Center to Alignment Research Center (ARC)

Raymond Arnold v1.3.0Oct 23rd 2024 GMT LW2

Applied to A bird's eye view of ARC's research by Raymond Arnold 5mo ago

Applied to Low Probability Estimation in Language Models by Ruben Bloom 5mo ago

Applied to Estimating Tail Risk in Neural Networks by Mark Xu 6mo ago

Applied to Why is there an alignment problem? by InfiniteLight 1y ago

Applied to Paul Christiano on Dwarkesh Podcast by ESRogs 1y ago

Applied to ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks by Magdalena Wache 2y ago

Applied to AXRP Episode 23 - Mechanistic Anomaly Detection with Mark Xu by DanielFilan 2y ago

Applied to Why "AI alignment" would better be renamed into "Artificial Intention research" by Daniel Böttger 2y ago

Applied to ARC is hiring theoretical researchers by Zack M. Davis 2y ago

Applied to The Goal Misgeneralization Problem by Madhusudhan Pathak 2y ago

Applied to More information about the dangerous capability evaluations we did with GPT-4 and Claude. by Lawrence Chan 2y ago

Applied to Prizes for matrix completion problems by Mark Xu 2y ago

Mark Xu v1.2.0May 1st 2023 GMT (+18/-4) LW2