AI ALIGNMENT FORUMTags
AF

Mesa-Optimization

•

Applied to Finding Backward Chaining Circuits in Transformers Trained on Tree Search by abhayesian 1d ago

•

Applied to Inner Optimization Mechanisms in Neural Nets by ProgramCrafter 17d ago

•

Applied to The Human's Role in Mesa Optimization by silentbob 20d ago

•

Applied to Visualizing neural network planning by Nevan Wichers 20d ago

•

Applied to Measuring Learned Optimization in Small Transformer Models by Jonathan Bostock 2mo ago

•

Applied to Understanding mesa-optimization using toy models by tilmanr 2mo ago

•

Applied to Counting arguments provide no evidence for AI doom by Quintin Pope 3mo ago

•

Applied to The Inner Alignment Problem by Jakub Halmeš 3mo ago

•

Applied to Satisficers want to become maximisers by JenniferRM 9mo ago

•

Applied to Mesa-Optimization: Explain it like I'm 10 Edition by brook 9mo ago

•

Applied to Runaway Optimizers in Mind Space by silentbob 10mo ago

•

Applied to Disincentivizing deception in mesa optimizers with Model Tampering by martinkunev 11mo ago

•

Applied to Challenge proposal: smallest possible self-hardening backdoor for RLHF by Christopher King 11mo ago

•

Applied to Simple experiments with deceptive alignment by Andreas_Moe 1y ago

•

Applied to Consequentialism is in the Stars not Ourselves by Cinera Verinia 1y ago