AI ALIGNMENT FORUM
Wikitags
AF

Human-AI Safety

Settings

Applied to A Universal Prompt as a Safeguard Against AI Threats by Zhaiyk Sultan 16d ago

Applied to Recursive Cognitive Refinement (RCR): A Self-Correcting Approach for LLM Hallucinations by Michael Xavier Theodore 1mo ago

Applied to OpenAI’s NSFW policy: user safety, harm reduction, and AI consent by 8e9 1mo ago

Applied to Gradient Anatomy's - Hallucination Robustness in Medical Q&A by Diego Sabajo 1mo ago

Applied to AI Safety Oversights by Davey Morse 2mo ago

Applied to Machine Unlearning in Large Language Models: A Comprehensive Survey with Empirical Insights from the Qwen 1.5 1.8B Model by Saketh Baddam 2mo ago

Applied to Tetherware #1: The case for humanlike AI with free will by Jáchym Fibír 2mo ago

Applied to I Recommend More Training Rationales by Gianluca Calcagni 3mo ago

Applied to Human-AI Complementarity: A Goal for Amplified Oversight by Rishub Jain 3mo ago

Applied to Launching Applications for the Global AI Safety Fellowship 2025! by Aditya_S 4mo ago

Applied to Will AI and Humanity Go to War? by Simon Goldstein 6mo ago

Applied to The Checklist: What Succeeding at AI Safety Will Involve by Hao Zhao 6mo ago

Applied to Launching applications for AI Safety Careers Course India 2024 by Axiom_Futures 11mo ago

Applied to Will OpenAI also require a "Super Red Team Agent" for its "Superalignment" Project? by Super AGI 1y ago

Applied to Let's ask some of the largest LLMs for tips and ideas on how to take over the world by Super AGI 1y ago