What are some non-purely-sampling ways to do deep RL? — AI Alignment Forum