Elizabeth — AI Alignment Forum

I think the example of sugar is off. Sugar was not originally a proxy for vitamins, because sugar was rarer than vitamins. A taste for sugar was optimizing for calories, which at the time was heavily correlated with survival. If our ancestors had access to twinkies, they would have benefited from them. The problem isn't that we became better at hacking the sugar signal, it's that we evolved an open ended preference for sugar when the utility curve eventually becomes negative.

A potential replacement: we evolved to find bright, shiny colors in fruit attractive because that signified vitamins, and modern breeding techniques have completely hacked this.

I worry I'm being pedantic by bringing this up, but I think the difference between "hackable proxies" and "accurate proxies for which we mismodeled the underlying reality" is important.

Goodhart Taxonomy

Elizabeth8y10

Okay, I think I disagree that extrapolating beyond the range of your data is Goodharting. I use the term for the narrower case where either the signal or the value stays in the trained range, but become very divergent from each other. E.g. artificial sweeteners break the link between sweetness and calories.

I don't think this is quite isomorphic to the first paragraph, but highly related: I think of sweetness as a proxy for calories. Are you defining sweetness as a proxy for good for me?

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments