Outer Alignment (also known as the reward misspecification problem) is the problem of specifying an reward function which captures human preferences. Outer alignment asks the question - "What should we aim our model at?" In other words, is the model optimizing for the correct reward such that there are no exploitable loopholes? It is also known as the reward misspecification problem.
Outer Alignment (also known as the reward misspecification problem) is the problem of specifying an reward function which captures human preferences. Outer alignment asks the question - "What should we aim our model at?" In other words, is the model optimizing for the correct reward such that there are no exploitable loopholes?
It is also known as thereward misspecificationproblem.