AI ALIGNMENT FORUMTags
AF

Alignment Jam

•

Applied to Finding Deception in Language Models by Esben Kran 3mo ago

•

Applied to Computational Mechanics Hackathon (June 1 & 2) by Nora_Ammann 6mo ago

•

Applied to Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon by Jason Hoelscher-Obermaier 7mo ago

•

Applied to Towards AI Safety Infrastructure: Talk & Outline by Paul Bricman 10mo ago

•

Applied to Tips, tricks, lessons and thoughts on hosting hackathons by gergogaspar 1y ago

•

Applied to Robustness of Model-Graded Evaluations and Automated Interpretability by Esben Kran 1y ago

•

Applied to How-to Transformer Mechanistic Interpretability—in 50 lines of code or less! by Esben Kran 1y ago

•

Applied to We Found An Neuron in GPT-2 by Esben Kran 1y ago

•

Applied to Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2 by Stefan Heimersheim 1y ago

•

Applied to Results from the AI testing hackathon by Esben Kran 1y ago

•

Applied to Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1 by Esben Kran 2y ago

•

Applied to Superposition and Dropout by Esben Kran 2y ago

•

Applied to Identifying semantic neurons, mechanistic circuits & interpretability web apps by Esben Kran 2y ago

•

Applied to Results from the interpretability hackathon by Esben Kran 2y ago

•

Applied to Dropout can create a privileged basis in the ReLU output model. by Esben Kran 2y ago

Esben Kran v1.0.0May 16th 2023 GMT (+70)

This lists the posts that have come from the Alignment Jam hackathons.

•

Created by Esben Kran at 2y