This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Anthropic (org)
•
Applied to
Cicadas, Anthropic, and the bilateral alignment problem
by
kromem
10h
ago
•
Applied to
EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
by
Stephen Casper
1d
ago
•
Applied to
Anthropic: Reflections on our Responsible Scaling Policy
by
Zac Hatfield-Dodds
3d
ago
•
Applied to
Anthropic AI made the right call
by
bhauth
1mo
ago
•
Applied to
OMMC Announces RIP
by
Adam Scholl
2mo
ago
•
Applied to
Vaniver's thoughts on Anthropic's RSP
by
Gunnar Zarncke
4mo
ago
•
Applied to
Introducing Alignment Stress-Testing at Anthropic
by
Gunnar Zarncke
4mo
ago
•
Applied to
On Anthropic’s Sleeper Agents Paper
by
Gunnar Zarncke
4mo
ago
•
Applied to
Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation
by
Soroush Pour
6mo
ago
•
Applied to
Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
by
Zac Hatfield-Dodds
7mo
ago
•
Applied to
Comparing Anthropic's Dictionary Learning to Ours
by
Robert_AIZI
7mo
ago
•
Applied to
Measuring and Improving the Faithfulness of Model-Generated Reasoning
by
HenningB
8mo
ago
•
Applied to
Anthropic's Responsible Scaling Policy & Long-Term Benefit Trust
by
Zac Hatfield-Dodds
8mo
ago
•
Applied to
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
by
Zac Hatfield-Dodds
8mo
ago
•
Applied to
Amazon to invest up to $4 billion in Anthropic
by
RobertM
8mo
ago
•
Applied to
AI Awareness through Interaction with Blatantly Alien Models
by
Vojtech Kovarik
10mo
ago
•
Applied to
Frontier Model Forum
by
RobertM
10mo
ago
•
Applied to
Frontier Model Security
by
RobertM
10mo
ago