Decaeneus - AI Alignment Forum

Thanks for the thoughtful reply. I read the fuller discussion you linked to and came away with one big question which I didn't find addressed anywhere (though it's possible I just missed it!)

Looking at the human social instinct, we see that it indeed steers us towards not wanting to harm other humans, but it weakens when extended to other creatures, somewhat in proportion to their difference from humans. We (generally) have lots of empathy for other humans, less so for apes, less so for other mammals (who we factory farm by the billions without most people particularly minding it) probably less so for octopi (who are bright but quite different) and almost none to the zillion microorganisms, some of which we allegedly evolved from. I would guess that even canonical Good Person Paul Christiano probably doesn't lose much sleep over his impact on microorganisms.

This raises the question of whether the social instinct we have, even if fully reverse engineered, can be deployed separately from the identity of the entity to which it is attached. In other words, if the social instinct circuitry humans have is "be nice to others in proportion to how similar to yourself they are", which seems to match the data, then we would need more than just the ability to place that circuitry in AGIs (which would presumably make the AGIs want to be nice to other similar AGIs). We would in fact need to be able to tease apart the object of empathy, and replace it with something that is very different than how humans operate, since no human is nice to microorganisms, so I see no evidence that the existing social instincts ever make any person be nice to something very different, and much weaker, than them, so I would expect it to work similarly in an AGI.

This is speculative, but it seems reasonably likely to me to turn out to be an actual problem. Curious if you have thoughts on it.

My AGI safety research—2022 review, ’23 plans

Decaeneus

1y30

1y20

This is drifting a bit far afield from the neurobio aspect of this research, but do you have an opinion about the likelihood that a randomly sampled human, if endowed with truly superhuman powers, would utilize those powers in a way that we'd be pleased to see from an AGI?

It seems to me like we have many salient examples of power corrupting, and absolute power corrupting to a great degree. Understanding that there's a distribution of outcomes, do you have an opinion about the likelihood of benevolent use of great power, among humans?

This is not to say that this understanding can't still be usefully employed, but somehow it seems like a relevant question. E.g. if it turns out that most of what keeps humans acting pro-socially is the fear that anti-social behavior will trigger their punishment by others, that's likely not as juicy a mechanism since it may be hard to convince a comparatively omniscient and omnipotent being that it will somehow suffer if it does anti-social things.

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments