This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Wikitags
AF
Login
Waluigi Effect
Settings
Applied to
Seven sources of goals in LLM agents
by
Seth Herd
2mo
ago
Applied to
Interview with Robert Kralisch on Simulators
by
WillPetillo
8mo
ago
Applied to
Antagonistic AI
by
Xybermancer
1y
ago
Applied to
Assessment of AI safety agendas: think about the downside risk
by
Roman Leventov
1y
ago
Applied to
Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation
by
Soroush Pour
1y
ago
Applied to
Thoughts on the Waluigi Effect
by
Steve Byrnes
2y
ago
Applied to
Remarks 1–18 on GPT (compressed)
by
Steve Byrnes
2y
ago
Applied to
Super-Luigi = Luigi + (Luigi - Waluigi)
by
Steve Byrnes
2y
ago
Applied to
The Waluigi Effect (mega-post)
by
Steve Byrnes
2y
ago
Steve Byrnes
v1.0.0
Jul 4th 2023 GMT
(+2087)
LW
9
Created by
Steve Byrnes
at
2y