AI ALIGNMENT FORUM
Wikitags
AF

Tripwire

Settings

Dakara v1.2.0Dec 30th 2024 GMT (+8/-24) LW1

Applied to Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive by Justausername 2y ago

Applied to Shutdown-Seeking AI by Simon Goldstein 2y ago

Applied to [FICTION] ECHOES OF ELYSIUM: An Ai's Journey From Takeoff To Freedom And Beyond by Super AGI 2y ago

Applied to Mr. Meeseeks as an AI capability tripwire by Eric Zhang 2y ago

Multicore v1.1.0Jul 18th 2021 GMT LW0

Applied to Superintelligence 13: Capability control methods by plex 4y ago

Applied to Corrigibility thoughts III: manipulating versus deceiving by plex 4y ago

Applied to Any work on honeypots (to detect treacherous turn attempts)? by plex 4y ago

Applied to Corrigibility thoughts II: the robot operator by plex 4y ago

Applied to Corrigibility thoughts I: caring about multiple things by plex 4y ago

Applied to Implications of Quantum Computing for Artificial Intelligence Alignment Research by plex 4y ago

plex v1.0.0Jul 17th 2021 GMT (+151) LW1

Created by plex at 4y