A scheme to credit hack policy gradient training — AI Alignment Forum