This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Iterated Amplification
Settings
•
Applied to
A proposal for iterated interpretability with known-interpretable narrow AIs
by
Peter Berggren
23d
ago
•
Applied to
Making LLMs safer is more intuitive than you think: How Common Sense and Diversity Improve AI Alignment
by
Jeba Sania
1mo
ago
•
Applied to
Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned?
by
RobertM
7mo
ago
•
Applied to
AIS 101: Task decomposition for scalable oversight
by
Charbel-Raphael Segerie
2y
ago
•
Applied to
Should AutoGPT update us towards researching IDA?
by
Ruben Bloom
2y
ago
•
Applied to
Is there a ML agent that abandons it's utility function out-of-distribution without losing capabilities?
by
Christopher King
2y
ago
•
Applied to
Notes on OpenAI’s alignment plan
by
RobertM
2y
ago
•
Applied to
Can you force a neural network to keep generalizing?
by
Q Home
2y
ago
•
Applied to
Ought will host a factored cognition “Lab Meeting”
by
jungofthewon
2y
ago
•
Applied to
Surprised by ELK report's counterexample to Debate, IDA
by
Evan R. Murphy
3y
ago
•
Applied to
Iterated Distillation-Amplification, Gato, and Proto-AGI [Re-Explained]
by
Gabe M
3y
ago
•
Applied to
Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios
by
Evan R. Murphy
3y
ago
•
Applied to
HCH and Adversarial Questions
by
Ruben Bloom
3y
ago
•
Applied to
My Overview of the AI Alignment Landscape: A Bird's Eye View
by
Neel Nanda
3y
ago
•
Applied to
Is iterated amplification really more powerful than imitation?
by
Chantiel
3y
ago
•
Applied to
Garrabrant and Shah on human modeling in AGI
by
Rob Bensinger
4y
ago
•
Applied to
Thoughts on Iterated Distillation and Amplification
by
Waddington
4y
ago
•
Applied to
Mapping the Conceptual Territory in AI Existential Safety and Alignment
by
Jack Koch
4y
ago
•
Applied to
Three AI Safety Related Ideas
by
Joe Collman
4y
ago
•
Applied to
Imitative Generalisation (AKA 'Learning the Prior')
by
Beth Barnes
4y
ago
•
Applied to
Debate update: Obfuscated arguments problem
by
Beth Barnes
4y
ago
•
Applied to
Meta-execution
by
niplav
4y
ago
•
Applied to
Security amplification
by
niplav
4y
ago
•
Applied to
Reliability amplification
by
niplav
4y
ago
•
Applied to
Techniques for optimizing worst-case performance
by
niplav
4y
ago
•
Applied to
Model splintering: moving from one imperfect model to another
by
Jérémy Perret
4y
ago
•
Applied to
Directions and desiderata for AI alignment
by
Jérémy Perret
4y
ago
•
Applied to
What are the differences between all the iterative/recursive approaches to AI alignment?
by
Jérémy Perret
4y
ago
•
Applied to
The reward engineering problem
by
Jérémy Perret
4y
ago
•
Applied to
Thoughts on reward engineering
by
Jérémy Perret
4y
ago
•
Applied to
My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda
by
Chi Nguyen
4y
ago
•
Applied to
Disagreement with Paul: alignment induction
by
Multicore
4y
ago
•
Applied to
A guide to Iterated Amplification & Debate
by
Rafael Harth
4y
ago
•
Applied to
What's wrong with these analogies for understanding Informed Oversight and IDA?
by
Multicore
4y
ago
•
Applied to
Synthesizing amplification and debate
by
Alex Turner
5y
ago
•
Applied to
A general model of safety-oriented AI development
by
Multicore
5y
ago
•
Applied to
Machine Learning Projects on IDA
by
Multicore
5y
ago
•
Applied to
Amplification Discussion Notes
by
Multicore
5y
ago
•
Applied to
Relaxed adversarial training for inner alignment
by
DanielFilan
5y
ago
•
Applied to
RAISE is launching their MVP
by
Multicore
5y
ago
•
Applied to
Capability amplification
by
Mark Xu
5y
ago