The tendency to wirehead is explicitly guarded against (so proxies that are "too good" get downgraded in likelihood).
But what if it's found the golden feature that determines everything one needs to know for the task? Wouldn't this be desired?
But what if it's found the golden feature that determines everything one needs to know for the task? Wouldn't this be desired?