RLHF

Reinforcement Learning from Human Feedback (RLHF) is a machine learning technique and an alignment technique where the model's training signal uses human evaluations of the model's outputs, rather than labeled data or a ground truth reward signal.

Reinforcement Learning from Human Feedback (RLHF) is a machine learning technique and an alignment technique where the model's training signal uses human evaluations of the model's outputs, rather than labeled data or a ground truth reward signal.