This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Honesty
•
Applied to
On Intentionality, or: Towards a More Inclusive Concept of Lying
by
Cornelius Dybdahl
2mo
ago
•
Applied to
Truth is Universal: Robust Detection of Lies in LLMs
by
Lennart Buerger
5mo
ago
•
Applied to
Control Vectors as Dispositional Traits
by
Gianluca Calcagni
6mo
ago
•
Applied to
Deep Honesty
by
Aletheophile
8mo
ago
•
Applied to
Glomarization FAQ
by
Zane
1y
ago
•
Applied to
Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models
by
Felix Hofstätter
1y
ago
•
Applied to
Lying is Cowardice, not Strategy
by
RobertM
1y
ago
•
Applied to
Discovering Latent Knowledge in the Human Brain: Part 1 – Clarifying the concepts of belief and knowledge
by
Joseph Emerson
1y
ago
•
Applied to
Uncovering Latent Human Wellbeing in LLM Embeddings
by
ChengCheng
1y
ago
•
Applied to
Assume Bad Faith
by
Zack M. Davis
1y
ago
•
Applied to
Ground-Truth Label Imbalance Impairs the Performance of Contrast-Consistent Search (and Other Contrast-Pair-Based Unsupervised Methods)
by
Tom Angsten
1y
ago
•
Applied to
“Desperate Honesty” by Agnes Callard
by
David Gross
1y
ago
•
Applied to
Contrast Pairs Drive the Empirical Performance of Contrast Consistent Search (CCS)
by
Scott Emmons
2y
ago
•
Applied to
[RFC] Possible ways to expand on "Discovering Latent Knowledge in Language Models Without Supervision".
by
Georgios Kaklamanos
2y
ago
•
Applied to
How to find cool things in a new place
by
Sam Brown
2y
ago