All of Khanh Nguyen's Comments + Replies

Nice work! I agree with many of the opinions. Nevertheless, the survey is still missing several key citations. 

Personally, I have made several contributions to RLHF:

  1. Reinforcement learning for bandit neural machine translation with simulated human feedback (Nguyen et al., 2017) is the first paper that shows the potential of using noisy rating feedback to train text generators. 
  2. Interactive learning from activity description (Nguyen et al., 2021) is one of the first frameworks for learning from descriptive language feedback with theoretical guarante
... (read more)
1Stephen Casper
Thanks, we will consider adding each of these. We appreciate that you took a look and took the time to help suggest these!