This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Wikitags
AF
Login
Human-AI Safety
Settings
Applied to
A Universal Prompt as a Safeguard Against AI Threats
by
Zhaiyk Sultan
16d
ago
Applied to
Recursive Cognitive Refinement (RCR): A Self-Correcting Approach for LLM Hallucinations
by
Michael Xavier Theodore
1mo
ago
Applied to
OpenAI’s NSFW policy: user safety, harm reduction, and AI consent
by
8e9
1mo
ago
Applied to
Gradient Anatomy's - Hallucination Robustness in Medical Q&A
by
Diego Sabajo
1mo
ago
Applied to
AI Safety Oversights
by
Davey Morse
2mo
ago
Applied to
Machine Unlearning in Large Language Models: A Comprehensive Survey with Empirical Insights from the Qwen 1.5 1.8B Model
by
Saketh Baddam
2mo
ago
Applied to
Tetherware #1: The case for humanlike AI with free will
by
Jáchym Fibír
2mo
ago
Applied to
I Recommend More Training Rationales
by
Gianluca Calcagni
3mo
ago
Applied to
Human-AI Complementarity: A Goal for Amplified Oversight
by
Rishub Jain
3mo
ago
Applied to
Launching Applications for the Global AI Safety Fellowship 2025!
by
Aditya_S
4mo
ago
Applied to
Will AI and Humanity Go to War?
by
Simon Goldstein
6mo
ago
Applied to
The Checklist: What Succeeding at AI Safety Will Involve
by
Hao Zhao
6mo
ago
Applied to
Launching applications for AI Safety Careers Course India 2024
by
Axiom_Futures
11mo
ago
Applied to
Will OpenAI also require a "Super Red Team Agent" for its "Superalignment" Project?
by
Super AGI
1y
ago
Applied to
Let's ask some of the largest LLMs for tips and ideas on how to take over the world
by
Super AGI
1y
ago