AI ALIGNMENT FORUM
AF

Value LearningAI
Frontpage

12

Value extrapolation partially resolves symbol grounding

by Stuart_Armstrong
12th Jan 2022
1 min read
10

12

Value LearningAI
Frontpage
Value extrapolation partially resolves symbol grounding
11johnswentworth
6Gordon Seidoh Worley
New Comment
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 9:38 PM
[-]johnswentworth4y110

That might work in a tiny world model with only two possible hypotheses. In a high-dimensional world model with exponentially many hypotheses, the weight on happy humans would be exponentially small.

Reply
[-]Gordon Seidoh Worley4y60

This doesn't really seem like solving symbol grounding, partially or not, so much as an argument that it's a non-problem for the purposes of value alignment.

Reply
Moderation Log
More from Stuart_Armstrong
View more
Curated and popular this week
2Comments
Mentioned in
24Different perspectives on concept extrapolation
7Value extrapolation, concept extrapolation, model splintering

Take the following AI, trained on videos of happy humans:

Since we know about AI wireheading, we know that there are at least two ways the AI could interpret its reward function[1]: either we want it to make more happy humans (or more humans happy); call this R1. Or we want it to make more videos of happy humans; call this R2.

We would want the AI to learn to maximise R1, of course. But even without that, if it generates R1 as a candidate and applies a suitable diminishing return to all its reward functions, then we will have a positive outcome - the AI may fill the universe with videos of happy humans, but it will also act to make us happy.

Thus solving value extrapolation will solve symbol grounding, at least in part.


  1. This is a massive over-simplification of what would be needed to define "happy" or anything similar. ↩︎