See also: Towards deconfusing wireheading and reward maximization, Everett et al. (2019).
There are a few subtly different things that people call "wireheading". This post is intended to be a quick reference for explaining my views on the difference between these things. I think these distinctions are sometimes worth drawing to reduce confusion.
Some crucial differences between these:
Note: this is not the same thing as changing a terminal goal! The reward function is not necessarily the terminal goal of the policy, because of inner misalignment.
Here, by "embeddedness" I mean in the sense of Demski and Garrabrandt (2019); in particular, the fact that the agent is part of the environment, and not in a separate part of the universe that only interacts through well defined observation/action/reward channels. RL is the prototypical example of a non-embedded agency algorithm.
That is, the reward function is unable to perfectly observe the ground truth state of the environment, which means that if there is another state the world cold be in that yields the same observation, the reward cannot distinguish between the two.