AI ALIGNMENT FORUM
Tags
AF

Sycophancy

•

Applied to SAE features for refusal and sycophancy steering vectors by neverix 2mo ago

•

Applied to Evaluating LLaMA 3 for political sycophancy by alma.liezenga 3mo ago

•

Applied to Two new datasets for evaluating political sycophancy in LLMs by alma.liezenga 3mo ago

•

Applied to Sycophancy to subterfuge: Investigating reward tampering in large language models by Raymond Arnold 6mo ago

•

Applied to Antagonistic AI by Xybermancer 10mo ago

•

Applied to Steering Llama-2 with contrastive activation additions by Alex Turner 1y ago

•

Applied to Reducing sycophancy and improving honesty via activation steering by Maxime Riché 1y ago

•

Created by Maxime Riché at 1y