5. Risks from preventing legitimate value change (value collapse)

commenting on this post because it's the latest in the sequence; i disagree with the premises of the whole sequence. (EDIT: whoops, the sequence posts in fact discuss those premises so i probably should've commented on those. ohwell.)

the actual, endorsed, axiomatic (aka terminal aka intrinsic) values we have are ones we don't want to change, ones we don't want to be lost or modified over time. what you call "value change" is change in instrumental values.

i agree that, for example, our preferences about how to organize the society we live in should change over time. but that simply means that our preference about society aren't terminal values, and our terminal values on this topic are meta-values about how other (non-terminal) values should change.

these meta-values, and other terminal values, are values that we should not want changed or lost over time.

in actuality, people aren't coherent agents enough to have immutable terminal values; they have value drift and confusion about values and they don't distinguish (or don't distinguish well) between terminal and instrumental values in their mind.

but we should want to figure out what our axiomatic values are, and for those to not be changed at all. and everything else being instrumental to that, we do not have to figure out alignment with regards to instrumental values, only axiomatic values.

Is preventing legitimate value change a genuine risk?

At first sight, one may doubt whether the obstruction of legitimate value change poses a genuine risk. Accepting that some cases of value change are unproblematic and legitimate does not necessarily imply that the prevention or impediment of such changes is, or is always, morally blameworthy. In other words, one may doubt that preventing certain types of value change actively constitutes a moral harm. In what follows, I will try to argue that it does, at least under certain circumstances.

First, consider a society which systematically or arbitrarily prevents people from embarking on aspirational pursuits which have the potential to cause legitimate value changes. Such a society would, on most accounts, be considered to manifest a problematic form of 'social authoritarianism' on the grounds of undermining individual freedom in significant ways. Freedom, here, can be understood in (at least) two ways. For one, we may refer to 'republican freedom' (see e.g. Pettit, 2001), namely the freedom from arbitrary applications of power. Under this account, structures of governance must be constrained (via e.g. the rule of law) from arbitrary use of power. Alternatively or in addition, we may also be concerned with the protection of a second type of freedom, namely freedom of self-determination. For example, Charles Taylor (1985), building on a long tradition of political thought including thinkers such as Rousseau and Mill, argues for a notion of freedom that captures ‘the extent that one has effectively determined oneself and the shape of one’s life’ (p. 213). As such, insofar as the ability to pursue legitimate value changes is a manifestation of right to self-determination, then processes that systematically undermine this freedom (by impeding legitimate value) changes are morally problematic.

This argument can be further supported by observing that the freedom to value self-determination is reflectively stable under, e.g., Kant’s categorical imperative (Kant, 1785/2001), Rawls' veil of ignorance (Rawls, 2001) or Mill's meta-inductive argument^[1] for liberty (MIll, 1859/2002). As for the first concept, the categorical imperative is the idea that we should 'act towards others as you would want them to act towards all'. As such, if I would like to protect my ability to undergo legitimate value change (e.g., in the form of an aspirational pursuit), I should also want to protect your ability to do the same. Rawl’s veil of ignorance captures the idea that a just society is one which was designed under the premise that we are ignorant of what concrete position or attributes we come to adopt in that society. Similar to before, the argument goes that, if I were to design the structures of a society without knowing what position I will come to hold in that society, I will want to create a society such that everyone’s freedom for value self-determination, including their ability to undergo legitimate value changes, is protected. Finally, Mill’s meta-inductive argument for liberty states that, based on the observations that we have many times before been wrong about what we consider morally right/wrong or what is valuable to us, while we should in part pursue our current best understanding of value and morality, we should also preserve a ‘sphere of liberty’ which protects our ability to change our minds in the future. While Mill himself made this argument for free speech in particular, later commentators have argued for an interpretation that extends the same argument to a justification for autonomy and revisability of one’s life plans more generally (see, e.g., Fuchs, 2001; Bilgrami, 2015). As such, actors, institutions or processes that hinder or impede legitimate value change lack reflective endorsability, and pose a genuine risk worthy of moral concern.

Mechanisms that undermine legitimate value change ('value collapse')

I will now proceed to discuss the mechanisms by which people’s ability to pursue self-determined value change can come to be impeded. The account I will provide is importantly inspired by Nguyen’s account of ‘value collapse’ (2020), of which I then generalise the context at hand. In short, Nguyen describes value collapse as a phenomenon where the hyper-explication of values (such as the use of metrics, or other ways to quantify and/or simplify values) worsens our epistemic attitude towards our values and the world. It is this deterioration of our epistemic attitudes that ultimately results in the weakening of our capacity of value self-determination. How exactly does value collapse occur?

First, Ngyuen observes that our values shape our attention. For example, if I value my health, I will differentially pay attention to things that I believe are healthy for me (e.g., sports, food, supplements). If I value curiosity, I am more likely to look for and notice this trait in others. Next, once we adopt explicit operationalisations of our values—e.g., by using metrics—this leads to more sharply defined boundaries of attention. Having explicated what we do care about also clarifies what we do not care about. For example, the introduction of GPA scores in education focuses the attention of students and educators towards those aspects of their education that are well captured by those scores. Unfortunately, explicated values aren’t typically able to capture all of what we care about. As such, while there are numerous good reasons for using them (more later), their introduction comes at the cost of directing attention away from things which aren’t well captured by said metric, even if sometimes those might be things we do in fact reflectively care about. In the case of GPA scores, for example, these might be things like wisdom, intellectual curiosity or a sense of civic duty.

Given our imperfect ability to explicate what we value, we rely on a healthy ‘error metabolism’ (a term taken from Wimsatt (2007)). Error metabolism is the idea that, if we are not able to capture something perfectly on the first try—which is the norm for bounded and error-prone creatures like humans when trying to navigate a complex world—we rely on ways to notice and integrate mistakes or deviation, and thereby iteratively improve on our initial attempt. However, and this is the core of the problem, because our attention mediates what we notice, perceive and thus can come to learn about, the imposition of narrower and sharper boundaries of attention weakens our ability to conduct said error metabolism. If I am convinced that the only form of literature worth reading is Greek Mythology, and I never venture outside of that genre, I might never come to enjoy the pleasures of Magical Realism, French Existentialism or 18th century ‘Sturm und Drang’. Or, returning to the case of GPA scores, because wisdom and intellectual curiosity are not well captured by them, the worry is that the introduction of those scores threatens to crowd out or undermine our shared ability to recognise and promote those latter, more subtle values.

In summary, explicated values, by rigidifying our boundaries of attention, tend to weaken our ability to notice, integrate and ultimately enact that which is valuable but isn’t already captured by the explicit value statement.^[2] As a result, the explicated values tend to become ‘sticky’ in the sense of coming to dominate the individual's practical reasoning, while those values that aren’t well captured by the explication tend to get neglected and—potentially—lost. As such, the dynamic of value collapse undermines a person’s ability for self-determined value change.

It is worth noticing that the explication of values typically involves simplifying them, i.e., stripping away (potentially important) detail, nuance and contextuality. Theodore Porter (1996), in his work on the history of quantification, notes that quantified knowledge focuses on some invariant kernel that has various context-sensitive nuances stripped off. Importantly, quantification is not simply a bad thing; to the contrary, quantification allows information to travel between contexts and aggregate easily. For example, without quantifying students' performance in the form of grades, it would be difficult to compare and aggregate a student’s performance across contexts as varied as maths, literature, sports and history. However, there are also costs of quantification. Improvements in portability and aggregability come at the cost of a loss of nuance, subtlety and context sensitivity. As such, I am not trying to argue that we should always avoid quantifying or explicating what we care about. Instead, I argue that it is important to pay attention to the way explication tends to lead to an impoverishment of our understanding of values, and our ability to alter, develop or refine our values over time. It is only when we are aware of the varied trade-offs that we can make informed choices about when and how much we want to rely on mechanisms of value explication.

Various other cases of value collapse are already readily observable. We have already mentioned the academic context, where metrics such as GPA or citation numbers may threaten richer notions of wisdom, intellectual curiosity or truth seeking. Other examples include health metrics (e.g., step count, BMI or calories burnt), which may fail to capture fuzzier and more personalised notions of health, wellbeing or performance; or, in the context of social media, the number of likes or watch time, which may override thicker and more prosocial notions of, e.g., learning, connection or aesthetic value. And so on.

...in the case of (advanced) AI systems

However, the question we are most interested in in the context of this essay, however, is what does this concern look like when extrapolated to the context of advanced AI systems?

Generally speaking, advanced AI systems will intensify this very effect. They do so in two main ways. First, in the case of processes that already rely on value explication, advanced AI systems will be able to optimise for a given set of explicated values more strongly, thereby further weakening the error metabolism of said process. To exemplify this tendency, let us consider the usage of AI systems in the context of processing job applications. This application-processing AI will (unless endowed with some mechanism to counteract this tendency) optimise its performance based on whatever values the firm has already been able to identify and capture (e.g., in the form of evaluative criteria). If there are features of an application dossier which would be interesting to the company, but fall outside of the current evaluative scheme, the AI evaluator will be insensitive to those features. Importantly, the AI evaluator will be more effective at this than a human evaluator—both at identifying applicants that fit the evaluative criteria, as well as at ignoring applicants that do not fit those criteria. Furthermore, a human evaluator might gain novel insights about what they care about during the application process (e.g., new insights about how to identify suitable candidates, or about what role specification the company would ideally be hiring for in the first place). In contrast, relying on the AI systems to do this work will (again, by default) tend to undercut those possibilities, thereby reducing the possibility for serendipitous (relative to the defined evaluative scheme) outcomes and insights. While initially AI systems might only be used for pre-processing job applications, increasingly they will come to determine more and more of the decision-making process, and as such, the likelihood to detect errors in the current specification will decrease. What could have been an open-ended process of value development and refinement on behalf of the human evaluator or the firm as a whole has become a closed and convergent process reifying whatever notion of value we started out with. Rather than denying that optimising for a given (even if imperfect) set of assumptions is never pragmatically justified, the concern is that, if unchecked, what this will lead to over time is a deterioration of our ability to ever come to improve on our existing assumptions or evaluative criteria.

A second way in which advanced AI systems will intensify the effects we can already make out today is that AI progress will allow similar processes to be deployed more broadly and across more areas of life. Consider, for example, George, who uses his ‘AI assistant’ (or similar AI application) to think about what new hobby to pick up or what new community to join. If said system optimises its suggestion for some fixed set of values that George presumably already possesses, or if it optimises to make him more predictable, this reduces serendipity and interferes with George’s ability to identify and come to value truly novel ways of being. Highly generalised AI systems could come to have similar effects in virtually all areas of life.

If the above account of value collapse is right, expanding the reach of practices of hyper-explication threatens the rich and subtle values that we have, both individually and as a society. In all of these cases, the more widespread their use, the more significant their effect. The better those systems become, the better they are at optimising for whatever has been explicated, and the more widespread the usage/adoption of such systems, the more pervasive the described effect. Value collapse threatens the possibility of genuinely open-ended value exploration found at the core of aspirational pursuits and legitimate value change. As such, as a society, and insofar as we care about protecting the possibility of legitimate value change, both individually and collectively, we need to think carefully about how the development and deployment of advanced AI will affect, interfere with and potentially substantially undermine this possibility.

A final remark is necessary before we close. While throughout this essay I have emphasised the risks that arise in the context of AI and value malleability, I want to conclude by acknowledging that there is also an important possibility of legitimate, technology-assisted value change. In other words, it is worth asking whether we could design machines that amplify, rather than diminish, our ability to develop and refine our values. Considering a similar idea, Swierstra (2013) writes: ‘Specific technologies can modulate moral choices and decisions, by making some options more pressing or attractive, and others less so. But this doesn’t imply a loss of morality, or moral decay. It can easily lead to more or better morals, for instance by enlarging the moral community of stakeholders. In this way, the relation between technology and morality appears as a marriage between two partners who neither have nor desire absolute autonomy. Is it possible to say something about the quality of that marriage?’ While I question Swierstra’s usage of ‘easy’ in the above extract, I agree that the possibility of AI systems augmenting individual self-determination and moral reasoning is an interesting idea to explore further. Insofar as this essay shall contribute to a larger philosophical and technological project, said project is just as much concerned with avoiding risks as it is concerned with realising progress and upside potential. However, in order to do so successfully, we first require significant further progress in, on the one hand, understanding the mechanisms of legitimate value change, and, on the other hand, our ability to build AI systems that reliably act in the intended way. Furthermore, we should be attentive to the way the deck is stacked at present: while it’s conceivable that, say, our future AI assistants will be highly-reflective moral agents helping us to figure out reflexively endorsed values, the illegitimate exploitation of our value malleability arguably looms as a more salient and more imminent threat.

^{^}

I am grateful to TJ for pointing me to this specific argument and the relevant discussion in the literature.

^{^}

Two clarifications: I don’t claim that the explication of values always or invariably leads to a narrowing of attention. For my purposes, it is enough to argue that, in practice, this is what tends to happen. Furthermore, it may be possible to explicate values while strengthening error metabolism in some other way. This may be a fully satisfying solution to the problem described here. The present claim is merely that, as things currently stand, and without careful and deliberate consideration, explication is typically accompanied with a weakened error metabolism.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

7

5. Risks from preventing legitimate value change (value collapse)

7

Is preventing legitimate value change a genuine risk?

Mechanisms that undermine legitimate value change ('value collapse')

...in the case of (advanced) AI systems