This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Anthropic (org)
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Anthropic (org)
Random Tag
Contributors
2
Ruben Bloom
0
Multicore
Anthropic
is an AI organization.
Not to be confused with
anthropics
.
Posts tagged
Anthropic (org)
Most Relevant
4
63
Anthropic's Core Views on AI Safety
Zac Hatfield-Dodds
2y
13
Review
1
48
Why I'm joining Anthropic
Evan Hubinger
2y
2
3
33
Toy Models of Superposition
Evan Hubinger
2y
2
1
33
Concrete Reasons for Hope about AI
Zac Hatfield-Dodds
2y
0
2
67
Transformer Circuits
Evan Hubinger
3y
3
1
110
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Zac Hatfield-Dodds
1y
12
Review
1
95
Introducing Alignment Stress-Testing at Anthropic
Evan Hubinger
1y
19
1
69
EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
Stephen Casper
7mo
7
2
42
Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman
2mo
18
1
46
Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
Zac Hatfield-Dodds
1y
0
1
44
Anthropic's Responsible Scaling Policy & Long-Term Benefit Trust
Zac Hatfield-Dodds
1y
5
Review
1
30
Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)
Lawrence Chan
2y
0
1
25
Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds
2mo
0
1
18
How do new models from OpenAI, DeepMind and Anthropic perform on TruthfulQA?
Owain Evans
3y
1
1
15
Frontier Model Security
Matthew "Vaniver" Gray
1y
1