This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Language Models
•
Applied to
If language is for communication, what does that imply about LLMs?
by
Bill Benzon
2d
ago
•
Applied to
Applying refusal-vector ablation to a Llama 3 70B agent
by
Vanessa Kosoy
3d
ago
•
Applied to
Navigating LLM embedding spaces using archetype-based directions
by
mwatkins
6d
ago
•
Applied to
Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
by
Olli Järviniemi
8d
ago
•
Applied to
On precise out-of-context steering
by
Olli Järviniemi
11d
ago
•
Applied to
Mechanistically Eliciting Latent Behaviors in Language Models
by
Vanessa Kosoy
13d
ago
•
Applied to
LLMs could be as conscious as human emulations, potentially
by
weightt an
14d
ago
•
Applied to
An interesting mathematical model of how LLMs work
by
Bill Benzon
14d
ago
•
Applied to
LLMs seem (relatively) safe
by
JustisMills
18d
ago
•
Applied to
At last! ChatGPT does, shall we say, interesting imitations of “Kubla Khan”
by
Bill Benzon
20d
ago
•
Applied to
How LLMs Work, in the Style of The Economist
by
Rocket Drew
22d
ago
•
Applied to
What's up with all the non-Mormons? Weirdly specific universalities across LLMs
by
mwatkins
25d
ago
•
Applied to
Inducing Unprompted Misalignment in LLMs
by
Sam Svenningsen
26d
ago
•
Applied to
An examination of GPT-2's boring yet effective glitch
by
niplav
26d
ago
•
Applied to
Claude 3 Opus can operate as a Turing machine
by
Gunnar Zarncke
1mo
ago
•
Applied to
Experiments with an alternative method to promote sparsity in sparse autoencoders
by
Eoin Farrell
1mo
ago
•
Applied to
Claude wants to be conscious
by
Joe Kwon
1mo
ago
•
Applied to
Barcoding LLM Training Data Subsets. Anyone trying this for interpretability?
by
right..enough?
1mo
ago
•
Applied to
Is LLM Translation Without Rosetta Stone possible?
by
cubefox
1mo
ago