AI ALIGNMENT FORUM
Tags
AF

Honesty

Settings

•

Applied to On Intentionality, or: Towards a More Inclusive Concept of Lying by Cornelius Dybdahl 4mo ago

•

Applied to Truth is Universal: Robust Detection of Lies in LLMs by Lennart Buerger 7mo ago

•

Applied to Control Vectors as Dispositional Traits by Gianluca Calcagni 7mo ago

•

Applied to Deep Honesty by Aletheophile 9mo ago

•

Applied to Glomarization FAQ by Zane 1y ago

•

Applied to Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models by Felix Hofstätter 1y ago

•

Applied to Lying is Cowardice, not Strategy by RobertM 1y ago

•

Applied to Discovering Latent Knowledge in the Human Brain: Part 1 – Clarifying the concepts of belief and knowledge by Joseph Emerson 1y ago

•

Applied to Uncovering Latent Human Wellbeing in LLM Embeddings by ChengCheng 1y ago

•

Applied to Assume Bad Faith by Zack M. Davis 1y ago

•

Applied to Ground-Truth Label Imbalance Impairs the Performance of Contrast-Consistent Search (and Other Contrast-Pair-Based Unsupervised Methods) by Tom Angsten 2y ago

•

Applied to “Desperate Honesty” by Agnes Callard by David Gross 2y ago

•

Applied to Contrast Pairs Drive the Empirical Performance of Contrast Consistent Search (CCS) by Scott Emmons 2y ago

•

Applied to [RFC] Possible ways to expand on "Discovering Latent Knowledge in Language Models Without Supervision". by Georgios Kaklamanos 2y ago

•

Applied to How to find cool things in a new place by Sam Brown 2y ago