There's a famous story about Diogenes and Plato:

[...] when Plato gave the tongue-in-cheek definition of man as "featherless bipeds," Diogenes plucked a chicken and brought it into Plato's Academy, saying, "Behold! I've brought you a man," and so the Academy added "with broad flat nails" to the definition.

What Plato was (allegedly) doing was not providing a definition of man, but what I'd call a sufficient reference or a sufficient pointer. If I'm in ancient Athens and divide the obvious objects that I can see or think of into "featherless bipeds" and "not featherless bipeds", then "man" will match up with the first category.

Then Diogenes, acting like an AI, created something that fell within the sufficient pointer class but that was clearly not a man. The Academy then amended the pointer to add "with broad flat nails", patching it till it was sufficient again. Had there been a powerful AI around, or a god, or a meddling human with enough means and persistence, then they could have produced a "featherless-biped-with-broad-flat-nails" that was also not a human, making the pointer inadequate again.

A lot of suggestions on AI safety are sufficient pointers. For example, take the idea that an AI should maximise "complexity". This comes, I believe, from the fact that, in our current world, the category of "is complex" and "is valuable to humans" match up a lot. It's a sufficient pointer. But along comes a Diogenes/AI with complexity as a goal, and now it enriches the set of objects in the world with complex-but-worthless things, breaking the "definition".

Therefore, a lot of things that people say they value or want AIs to preserve/maximise, should not be taken as saying that they value the specific thing they say. Instead, this should be taken as pointer to what they value in the current world, and the challenge is then to extend that to new maps and new territories.

New Comment