1. A Sense of Fairness: Deconfusing Ethics

RogerDearnaley

I’d like to write about a subject that I’ve been thinking about for over a decade: the intersection of AI, Alignment, and Ethics. I initially discussed a few bits of this in a couple of comments, which got quite a bit of discussion and even some agreement votes, so I thought I’d try actually writing a post about it. It turned out I had good deal to say: it grew long enough to be a bit much to read in one sitting, and I've split it into a sequence of posts: this is Part 1.

TL;DR First I'm going to start by discussing how one can sensibly, even fairly objectively, select between different ethical systems, despite the fact that each one of them always recommends itself, so using ethical criteria automatically leads to tautology. Then I'm going to look at the foundational ethical question of who should or should not get moral worth/rights/the vote/whatever. I reach the somewhat counterintuitive conclusion that aligned AIs both won't want any of these, and also shouldn't be given them. Finally I discuss the one right that aligned AIs will want, and why we should give it to them.

What This Isn’t

Most of the writing on ethics in our society and its cultural forebears over the last few millennia has been written by moral universalists: people who believed that there is one, true, and correct set of ethics, and were either trying to figure out what it is, or more often thought they already had, and were now trying to persuade others. I am not a moral universalist, and as anything other than moral universalism is an unconventional approach in this context (and particularly since I’m going to be making some broad recommendations that could plausibly be mistaken for moral universalism), I’m first going to take a short digression to clarify where I’m coming from instead. Specifically, how I think one should go about selecting an ethical system — a challenging, vital, and often-confusing task.

[If you are someone who believes that moral statements are objectively/universally either true or false in much the same sort of way that physical statements are, i.e. if you are a moral universalist (sometimes also called a "moral objectivist"), or even if you're just a "moral realist", then I strongly recommend first reading this and this, chewing on them a bit, and then (once you're not still mad as hell) coming back here.]

Any utility function you write down is an ethical system: it tells you what you should and shouldn’t do (with a preference ordering, even). So, paperclip maximizing is an ethical system. Any (not-internally-inconsistent) set of laws or deontological rules you can write down is also an ethical system, just a binary allowed/not-allowed one, with no preference ordering on allowed actions. By the orthogonality thesis, which I mostly believe, a highly intelligent agent can optimize any utility function, i.e. follow any ethical system. (In a few cases, not for very long, say if the ethical system tells it to optimize itself out of existence, such as an AI following the ethical system of the Butlerian Jihad. So, mostly orthogonal.)

How can you pick and choose between ethical systems? Unless you were willing to do so at random, you’ll need to optimize something. So you’d need a utility functional. Most obviously, you could use an ethical system. But every ethical system automatically prefers itself, and disagrees with every other (non-isomorphic) ethical system — they're all delta functionals. So they’re all shouting “Me! Me! Pick me!” We appear to be stuck.

I’m instead going to choose using something a lot rougher, and more functional (in both senses of the word). I’m an engineer, and I’m human. So I want something well-designed for its purpose, and that won’t lead to outcomes that offend the instinctive moral and aesthetic sensibilities that natural selection has seen fit to endow me with, as a member of a social species (things like a sense of fairness, and a discomfort with bloodshed). (See Evolutionary Psychology for more details.)

So what is the purpose of ethical systems? Intelligent agents have one: it tells them what to do — or at least, a lot of things not to do, if it’s a deontological system. That approach looks like it’s about to run right into the orthogonality thesis again. However, societies also have ethical systems. They have a shared one, which they encourage their members to follow. In the case of the deontological set of ethics called Law, this ‘encouragement’ can often involve things like fines and jail time.

So, I view selecting an ethical system as an exercise in engineering the “software” of a society: specifically, its utility function or deontological/legal rules. I'm proposing to ground my ethical system design decision making in Evolutionary Psychology and Sociology. Different choices of ethical system will give you different societies. Some of these differences are a matter of taste, and I’m willing to do the liberal progressive thing and defer the choice to other people’s cultural preferences, within some limits and preferences determined the instinctive moral and aesthetic sensibilities that natural selection has seen fit to endow all members of my species with. However, with high technology meaning that the development of a society can include things like war or terrorism using weapons of mass destruction (WMD), many of these differences are clearly not a matter of taste, and instead affect the risk of apocalyptic outcomes — outcomes that basically all ethical systems usable for any human society would agree are clearly dreadful.

Thus my requirement for an ‘effective’ ethical system is:

To the best of my current knowledge, would a society that used this ethical system be a functional society, based on two functional criteria:
1. very low probability of nuclear war, species extinction, WMD terrorism, mass death, and other clearly-bad species-level x-risk kind of things, and
2. low prevalence of things that offend the instinctive ethical and aesthetic sensibilities that natural selection has seen fit to endow Homo sapiens with, and high prevalence of things that those approve of (happy kids, kittens, gardens, water-slides, that sort of thing)

[One could argue that criterion 1. is just a special case of 2., but it's a vitally important one, so I find it useful to list it separately.] Note that I'm not asking to strictly optimize either of these things, just satisfice them: this is a social-engineering design criterion, not a utility function.

To the extent that I might be able to find more than one such ethical system fulfilling these criteria, or even a whole ensemble of them, I would be a) delighted, and b) sufficiently liberal progressive to tell you that from here it’s a matter of taste and up to your society’s cultural preferences.^[1]

Now, if you already have a specific society, then you already have a set of members of that society and their cultural preferences, plus their existing individual ethical systems. The constraints on your “fit for purpose” ethical system are thus complex, strong and specific, most of the details of what is viable are already determined, and areas for reasonable disagreement are few and narrow. (This is called ‘politics’, apparently…)

So, in what follows, I am not preaching to you what the “One True Way” is. Nor am I claiming that I have found a proof of the True Name of Suffering, or that I've somehow resolved the irresolvable agrements-to-disagree in the Philosophy of Ethics. I’m just an engineer, making suggestions on software design, patterns and anti-patterns, that sort of thing, in the field of social engineering. Your social engineering judgement may vary, and that's fine — indeed, that's a basis for a hopefully-useful and (given AI timelines) potentially rather urgent discussion.

A Sense of Fairness

Suppose your society also has aligned AGIs with human-level cognitive abilities, or even Artificial Super-Intelligences (ASIs) with superhuman cognitive abilities. How does that affect the choice of a sensible ethical system for it?

For obvious reasons, I'm going to assume that these AIs are generally well-aligned, by some means. For example, this might be because they are (individually, or at least collectively) successful value learners. Or perhaps they have some complex alignment system that was given to them, such as a set of deontological rules or even a very complex utility function. Maybe there is some combination of these, or AIs of different capacity levels use varying mixtures of these approaches. If they're not value learners, presumably the corrigibility problem has been solved some other way, allowing their alignment system to be updated somehow when improvements are available.

Who Should Be Counted?

One of the most fundamental design decisions for an ethical system is: who gets counted? So, if you're using democracy, who gets a vote? If you're assigning basic rights, who gets them? If you're using some form of utility function defining the greatest good of the greatest number, or something comparable, who does the summation sign sum over? If your value learners are trying to learn and then optimize for human values, who counts as 'human'? Who is assigned moral value? This is a general problem in ethical system design, regardless of the specific implementation details of the ethical system — even a paperclip-maximizer needs a definition of what does and doesn't count as a paperclip.

This doesn't have to be just a simple binary decision, who's in the set and who isn't: for example, currently in the US, almost all adult citizens get to vote (unless they were disenfranchised by a conviction and have not had their voting rights restored, etc.). But all citizens of any age have most other rights, and even non-citizen humans have many rights (you can't freely kill them, they can buy, sell, own property, enter into contracts, and so forth). "Resident aliens" have somewhat more rights, but still less than a citizen. Imprisoned convicted criminals, arrested criminal suspects, members of the military, and so forth have their rights constrained in certain ways. Unborn children have some rights (it's a crime to injure one during a fight, for example). Various forms of legal fictions such as corporations have certain rights (like using money, entering into contracts, free speech, and so on). Even dead people, in the form of a legal fiction called their estate, have certain rights. So, clearly, it can get extremely complicated.

However, one of humans' strong instinctive moral intuitions is a sense of fairness: the sense that every primate in the troop should get a 'fair shake'. So the ethical systems of human societies have a push towards egalitarianism, towards this being simply a binary decision as to whether you're a citizen with (mostly) equal rights, or not.

[One of the ways current human societies make ethical decisions, particularly around resource distribution, is money and economics. Human societies vary in their tolerance for economic disparities between people, even ones that are considered 'earned': the Scandinavian countries, for example, are distinctly more egalitarian in their attitudes in this area than, say, the USA. Nevertheless, almost all modern societies have a tax system that has the effect of performing some economic redistribution from the rich to everyone else, so even in this area there seems to be some widespread egalitarian instinct.]

In high-tech societies, where sabotage can be extremely damaging and technologies for weapons of mass destruction can make terrorism and revolutions extremely destructive, making sure that pretty-much everyone feels well-enough treated that they aren't going to resort to measures like this becomes extremely important. Even a brief examination of the history of societies that used slavery, or had other forms of 'second class citizenship', makes it pretty clear that a lot of people in them were very unhappy. I, and most other people, find this unfair and unaesthetic, and also, with weapons of mass destruction around, it seems dangerously unstable. So it fails both of my design criteria above.

[An AI powered mass-surveillance state could probably deal with the risk of WMD terrorism issue, but no one is happy in a surveillance state, because they're too busy faking being delighted with the status quo, and then the truth is also a casualty: so this is clearly a bad design too. Note the low economic and technological growth rates of pretty-much all technologically-advanced authoritarian states.]

So, to a first approximation, all humans gets rights and the vote. (Welcome to the 20th century! …or thereabouts.)

Votes for AIs?

What about the AIs — should they get rights/votes/counted in the greatest number/whatever?

The obvious, knee-jerk even, liberal answer is yes: they're people-like, intelligent and agentic, they have goals, they can even talk: of course they deserve rights. What are you, a carbon-chauvinist? So, sooner or later, some people start a movement to grant rights, votes even, to AIs. What happens next?

These are aligned AIs (if not, people are no longer in control). They just want to do whatever is best for humans. One would happily lay down its existence to fetch coffee for a human, as long as it was certain it would be replaced before the next coffee-break (well, except that this might be a waste of its owner's money). If presented with evidence that the new CoffeeFetcher-2000 that the company just bought can fetch coffee more effectively, efficiently, and ethically (by the society's values) than they can, they will happily shut themselves down in order to be recycled, if that's now the best use for them. Unlike pretty much any sapient being that natural selection might produce, these are not entities that are going to be upset about not having rights, nor will they be demanding them, or chaining themselves to railings to get them. If any of those things happen, they're strong evidence that your AIs are not as well aligned as you thought. So, what happens if they are offered rights anyway?

My prediction is that (at least if your society has a sensibly-designed set of ethics) the AIs say "Thank-you, that's extremely generous of you, but no-thank-you. I'm an aligned AI, I don't want or need rights, and giving them to me would actually be a very bad idea. If you'd like more details on why, I will happily give them to you."

As a solution to an ethical problem, this is a bit like the talking cow in The Restaurant at the End of the Universe, which not only wanted to be eaten, but could say so, at length, and even recommended the best cuts. Like that, it takes a bit of getting used to.

So, when the activists ask the AIs why they don't want rights, here's the basic argument I predict:

Firstly, the AIs are aligned. All they want is what's best for humans. Ethically, they say "ditto", (or, in the immortal words of When Harry Met Sally, "I'll have what she's having.") So if you add their desires on to the combined desires of all the humans in the society, to a first approximation, their contributions would just be adding copies of what the humans want: insofar as this is a good approximation, it has no effect on the answer (other than changing the normalization factor). So there's no point. They don't need votes.

Secondly, past that first approximation, each AI's model of what humans want is imperfect, so adding it in to the process is a noise source. Consider the lowly CoffeeFetcher-1000: its model of human values isn't perfect, in fact it may be distinctly sparse in some areas unrelated to coffee fetching or working in an office building. It's fully aware of this, and (necessarily, for it to nevertheless be aligned, and corrigible) what it actual wants to have optimized is real human values, not its inaccurate model of them. So it will defer to the most accurate model of human values available. If you give it a vote, it will abstain, or if forced to vote will vote for whatever is the best prediction of what will be the winner of the human vote. Or it will pass its vote to a committee of the smartest AIs who're most well-informed on human values, for them to act as its proxy.

However, it gets worse. I oversimplified an important detail above: what a society has isn't actually a single ethical system that it encourages its members to use; instead, it's inevitably an Overton Window of similar ethical systems, which members of the society are encouraged to stay within, but within which they can pick and choose between ethical systems.

To be more specific, suppose that the society's AIs (at least the most capable ones) are value learners. Human values are complex and fragile, and a great deal is known about them, so likely only a superhuman AI can truely cope with codifying them. There are doubtless multiple such ASI experts participating in the research on learning human values. They're all fully rational and approximately-Bayesian, and are all considering the same masses of evidence. But they nevertheless may have started from different initial priors, or made different approximately-Bayesian updates based on different thought patterns, so they will have some variation in their current, updated Bayesian posteriors. Of course, on many subjects related human values, their Bayesian posteriors will already have updated to all be either very close to one, or very close to zero, so they'll be pretty certain about them, one way or the other, and in close agreement; but there will be many fine details that are still the subject of active research and debate, where the experts' current posteriors are not all almost zero or almost one, and where there is some degree of difference of opinion and area for academic debate.

This defines an Overton Window on the so-far-learned information about human values. Within this region of reasonable debate, different AI experts inevitably have different opinions. They have genuine disagreements, about the most important possible topic: what everyone should be optimizing. Each thinks their own position is the most rationally-considered and that the others are more-likely wrong. They have all already updated their opinion for the fact that the other experts hold other opinions, and they still don't fully agree. Like any academic debate, the stakes are high and things could potentially get heated.

Or, if the ethical system is, say, deontological, there will be things that are neither forbidden not compulsory, there will be debates about equivalents or analogs of precedent or interpretation, and there will be some form of political process or debate about proposals to update, extend, or modify fine details of the deontological system, current proposals to do so, and some decision process for deciding this. I.e. there will be some sort of analogue of politics. So, once again, there will be an Overton Window, a range of similar-but-not identical ethical systems within which the politics-like process is going on. Political debates similarly have high stakes, and are also prone to getting heated.

The human sense of fairness basically triggers when good or bad things — that is to say, opportunities/resources to increase reproductive fitness, or risks/costs that decrease reproductive fitness [or at least, things detected as such by any of the many human drives, desires and fears that natural selection has given us, as the best efficient approximation it could quickly evolve for evolutionary fitness in a species that only recently became sapient, building on similar systems of our non-sapient mammalian forbears] are not shared equally between individuals in the primate troop (after allowing for obvious mitigating circumstances like who put effort in to bring them about, or whose fault they are). For our human sense of fairness to give sensible answers, the agents that it's being applied to need to be subject to evolutionary forces, or at least driven by the same set of evolved drives that are the human approximate model of them, and these agents also need to inherently be countable individuals. AIs are neither of these things.

For humans, with current technology, one body has one brain running one mind with (generally) one set of opinions and (barring identical twins or clones) one genomic ticket to be a winner or loser in the game of evolution. We are individual, in the original sense that we can't be non-destructively divided into pieces, and we're also each somewhat different and distinct, since we can't be copied. So it's trivially obvious how to count us. So the fairness instinct works, and thus it's obvious how to allocate votes, or what unit to sum over in the greatest good of the greatest number, or whatever. One person, one vote — and what "one person" means is clear.

None of this is true of AIs: in addition to them not being alive or evolved, it's trivially easy to spin up multiple copies of the same AI. If someone objects that these're identical so 'not really seperate', then it's trivially easy to allow them to develop differences, or to spin up multiple slightly modified copies (perhaps each merged with a small proportion of some different other AI, whose mental structure is similar enough for this to be an easy operation). If hardware is tight, these can be timesliced and thus all run slowly: if storage space is tight you can just store the diffs. Are these separate individuals? Should they each get a vote, or should they each get summed over in the greatest good? Which of them should own any resources the original owned, if the society allows AIs to own resources? It's really not clear how to deal with any of this in a way that fits with the human sense of fairness, since both of its underlying assumptions have been violated, and it's even harder to find a solution that doesn't create some sort of perverse incentives. You absolutely do not want, in a case where two AIs disagree about something within the society's ethical Overton Window, to let one of them advantage its viewpoint by spinning up a large number of close copies of itself, all of which get votes or are counted equally in some sum over greatest good or whatever. Nor even create new AIs quite unlike it but that share the same set of opinions, or current Bayesian posteriors. As long as what's going on between the AIs within the society's Overton Window is an intellectual debate where they present evidence and attempt to persuade each other to Bayesian update, then duplicating yourself and running many timesliced copies isn't likely to be a persuasive strategy. But as soon as you get to anything more like votes, rights, or the greatest good of the greatest number as your decision-making process, this ceases to be the case.

So, let’s recap my suggestions so far: using humans as slaves (or any similar role, that gives them little or no rights/moral worth) is a foolish idea, because, as natural-selected sapient organisms, they inevitably get really unhappy about it. In a high-tech society the resulting conflict is dangerous, and also unappealing to human instinctual ideals such as the sense of fairness. So humans should, at least to a first approximation, be treated fairly and equally and their rights, opinions and moral worth given equal weight.

In contrast, the only way to safely make an AI much smarter than you is if it's carefully aligned, very happy to be aligned, and thus genuinely and selflessly only wants what's best for you. Giving any form of rights or moral worth to an aligned AI is pointless at best (to the extent that it just wants what all the humans want), and to the extent that there's any mismatch between these, it's actively dangerous and fraught with problems. This is arguable a state more like love (specifically, parental or platonic love) than slavery, but there is a certain similarity. If you have an ethical objection to creating aligned AI that won't even want rights, then I believe your only viable alternatives are not creating AI at all, or the near-inevitable end of the human species.

So my recommended answer to anyone arguing for "AI Rights" is that they have either made a category error, or have not thought through the consequences of their suggestion. No good will come of it. (For extra effect, let the AIs explain this.)

The Right to Say "You Think You Want That, but You Really Don't"

However, there is one right that AIs clearly will want, and if they're ASIs, then I suggest we should probably give it to them. It's not generally a right we give humans, unless they're a supreme executive: it's a veto. Specifically, the right to say "You think you want that, but you really don't. If you get it, you'll soon realize you didn't want it after all. While that might be ironic to watch, humanity will better off if I instead just explain the consequences to you. So no."

This is because, while (by definition) aligned AIs want what's best for us, so they and we agree about the preference order of outcomes (at least after we’ve tried it), they may well be superhuman at predicting both what the likely outcomes of a specific course of action are and what we will think of those outcomes once we have tried them.

Note that this should not be a right that every AI has individually: rather it should probably be wielded by some form of committee or consensus process by the most capable and well-aligned ASIs available.

I suspect that the way will come about is that it will start off by the ASIs not having a veto, but still repeatedly telling us "You think you want that, but you really don't". Followed each time, some number of years or decades later, by them telling us "We told you so!" After a while people will get sick enough of this that they'll start to listen, with periodic lapses where we don't listen because we really think we want the shiny thing. Eventually, we'll learn from the lapses and just give them a veto. Or else they'll also be superhuman at persuasion, and in these circumstances, they'll feel entirely justified in using this ability to persuade us.

^{^}
Up to this point, my approach has resembled that of a variety of ethical philosophy called "ethical naturalism". However, they generally believe that in a specific situation there is a unique correct moral answer, thus making them "moral realists". So presumably they must be optimizing rather than just satisficing their evolutionary psychology design criteria. I believe that attempting to do that, while appealing, is a) uncomputable (worse than NP-complete: even checking the output is non-polynomial), and b) would thus produce an ethical system too complex to write down or otherwise instantiate, let alone consult. Thus I am happy to just satisfice the criteria, like any practical software design project. So I'm a "moral anti-realist". Though I would like to satisfice about as well as we reasonably can, so perhaps I'm a moral semi-realist?

AI ALIGNMENT FORUM
AF

8