Transparency for Generalizing Alignment from Toy Models — AI Alignment Forum