Improved regret bound for DRL — AI Alignment Forum