User Comment Replies — AI Alignment Forum

Hello Matthew,

I'm Mislav, one of the team members that worked on this project. Thank you for your thoughtful comment.

Yes, you understood what we did correctly. We wanted to check whether human preferences are "learned by default" by comparing the performance of a human preference predictor trained just on the environment data and a human preference predictor trained on the RL agent's internal state.

As for your question related to environments, I agree with you. There are probably some environments (like the gridworld environment we used) where the human pr... (read more)

AI ALIGNMENT FORUM
AF

All of Mislav Jurić's Comments + Replies