This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Alignment Jam
•
Applied to
Finding Deception in Language Models
by
Esben Kran
3mo
ago
•
Applied to
Computational Mechanics Hackathon (June 1 & 2)
by
Nora_Ammann
6mo
ago
•
Applied to
Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon
by
Jason Hoelscher-Obermaier
7mo
ago
•
Applied to
Towards AI Safety Infrastructure: Talk & Outline
by
Paul Bricman
10mo
ago
•
Applied to
Tips, tricks, lessons and thoughts on hosting hackathons
by
gergogaspar
1y
ago
•
Applied to
Robustness of Model-Graded Evaluations and Automated Interpretability
by
Esben Kran
1y
ago
•
Applied to
How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
by
Esben Kran
1y
ago
•
Applied to
We Found An Neuron in GPT-2
by
Esben Kran
1y
ago
•
Applied to
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2
by
Stefan Heimersheim
1y
ago
•
Applied to
Results from the AI testing hackathon
by
Esben Kran
1y
ago
•
Applied to
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
by
Esben Kran
2y
ago
•
Applied to
Superposition and Dropout
by
Esben Kran
2y
ago
•
Applied to
Identifying semantic neurons, mechanistic circuits & interpretability web apps
by
Esben Kran
2y
ago
•
Applied to
Results from the interpretability hackathon
by
Esben Kran
2y
ago
•
Applied to
Dropout can create a privileged basis in the ReLU output model.
by
Esben Kran
2y
ago
Esben Kran
v1.0.0
May 16th 2023 GMT
(+70)
LW
0
This lists the posts that have come from the
Alignment Jam hackathons
.
•
Created by
Esben Kran
at
2y
This lists the posts that have come from the Alignment Jam hackathons.