This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Wikitags
AF
Login
Subscribe
Discussion
0
Alignment Jam
Subscribe
Discussion
0
Written by
Esben Kran
last updated
16th May 2023
This lists the posts that have come from the
Alignment Jam hackathons
.
Posts tagged
Alignment Jam
Most Relevant
0
53
We Found An Neuron in GPT-2
Joseph Miller
,
Clement Neo
2y
0
0
52
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
Stefan Heimersheim
,
Marius Hobbhahn
2y
0
1
38
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2
Stefan Heimersheim
,
Marius Hobbhahn
2y
0
0
13
How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
Stefan Heimersheim
2y
0
0
14
Robustness of Model-Graded Evaluations and Automated Interpretability
Simon Lermen
,
viluon
2y
2
0
7
Finding Deception in Language Models
Esben Kran
,
Archana Vaidheeswaran
7mo
0