AI ALIGNMENT FORUM
Wikitags
AF

Waluigi Effect

Settings

Applied to Seven sources of goals in LLM agents by Seth Herd 2mo ago

Applied to Interview with Robert Kralisch on Simulators by WillPetillo 8mo ago

Applied to Antagonistic AI by Xybermancer 1y ago

Applied to Assessment of AI safety agendas: think about the downside risk by Roman Leventov 1y ago

Applied to Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation by Soroush Pour 1y ago

Applied to Thoughts on the Waluigi Effect by Steve Byrnes 2y ago

Applied to Remarks 1–18 on GPT (compressed) by Steve Byrnes 2y ago

Applied to Super-Luigi = Luigi + (Luigi - Waluigi) by Steve Byrnes 2y ago

Applied to The Waluigi Effect (mega-post) by Steve Byrnes 2y ago

Steve Byrnes v1.0.0Jul 4th 2023 GMT (+2087) LW9

Created by Steve Byrnes at 2y