I reasonably often find myself grateful that some dysfunctional norms or epistemic practices will most likely become obsolete. It’s a bit scary to think about a world where the only solution is waiting for someone to snap out of it.
I've been thinking a lot about this lately, so I'm glad to see that it's on your mind too, although I think I may still be a bit more concerned about it than you are. Couple of thoughts:
What if our "deliberation" only made it as far as it did because of "competition", and that nobody or very few people knows how to deliberate correctly in the absence of competitive pressures? Basically, our current epistemic norms/practices came from the European Enlightenment, and they were spread largely via conquest or people adopting them to avoid being conquered or to compete in terms of living standards, etc. It seems that in the absence of strong competitive pressures of a certain kind, societies can quickly backslide or drift randomly in terms of epistemic norms/practices, and we don't know how to prevent this.
What's your expectation of the fraction of total potential value that will be lost due to people failing to deliberate correctly (e.g., failing to ever "snap out of it", or getting "persuaded" by bad memes and then asking their AIs to lock in their beliefs/values)? It seems to me that it's very large, easily >50%. I'm curious how others would answer this question as well.
Alice and Bob can try to have an agreement to avoid racing ahead or engaging in some kinds of manipulation, and analogous a broader society could adopt such norms or divide into communities with internal agreements of this form.
In a sane civilization, tons of people would already be studying how to make and enforce such agreements, e.g., how to define what kinds of behaviors count as "manipulation", and more generally what are good epistemic norms/practices and how to ensure that many people adopt such norms/practices. If this problem is solved, then maybe we don't need to solve metaphilosophy (in the technical or algorithmic sense), as far as preventing astronomical waste arising from bad deliberation. Unfortunately it seems there's approximately zero people working on either problem.
I would rate "value lost to bad deliberation" ("deliberation" broadly construed, and including easy+hard problems and individual+collective failures) as comparably important to "AI alignment." But I'd guess the total amount of investment in the problem is 1-2 orders of magnitude lower, so there is a strong prima facie case for longtermists prioritizing it.
Overall I think I'm quite a bit more optimistic than you are, and would prioritize these problems less than you would, but still agree directionally that these problems are surprisingly neglected (and I could imagine them playing more to the comparative advantages/interests of longermists and the LW crowd than topics like AI alignment).
What if our "deliberation" only made it as far as it did because of "competition", and that nobody or very few people knows how to deliberate correctly in the absence of competitive pressures? Basically, our current epistemic norms/practices came from the European Enlightenment, and they were spread largely via conquest or people adopting them to avoid being conquered or to compete in terms of living standards, etc. It seems that in the absence of strong competitive pressures of a certain kind, societies can quickly backslide or drift randomly in terms of epistemic norms/practices, and we don't know how to prevent this.
This seems like a quantitative difference, basically the same as your question 2. "A few people might mess up and it's good that competition weeds them out" is the rosy view, "most everyone will mess up and it's good that competition makes progress possible at all" is the pessimistic view (or even further that everyone would mess up and so you need to frequently split groups and continue applying selection).
We've talked about this a few times but I still don't really feel like there's much empirical support for the kind of permanent backsliding you're concerned about being widespread. Maybe you think that in a world with secure property rights + high quality of life for everyone (what I have in mind as a prototypical decoupling) the problem would be much worse. E.g. maybe communist china only gets unstuck because of their failure to solve basic problems in physical reality. But I don't see much evidence for that (and indeed failures of property rights / threats of violence seem to play an essential role in many scenarios with lots of backsliding).
What's your expectation of the fraction of total potential value that will be lost due to people failing to deliberate correctly (e.g., failing to ever "snap out of it", or getting "persuaded" by bad memes and then asking their AIs to lock in their beliefs/values)? It seems to me that it's very large, easily >50%. I'm curious how others would answer this question as well.
There are some fuzzy borders here, and unclarity about how to define the concept, but maybe I'd guess 10% from "easy" failures to deliberate (say those that could be avoided by the wisest existing humans and which might be significantly addressed, perhaps cut in half, by competitive discipline) and a further 10% from "hard" failures (most of which I think would not be addressed by competition).
It seems to me like the main driver of the first 10% risk is the ability to lock in a suboptimal view (rather than a conventional deliberation failure), and so the question is when that becomes possible, what views towards it are like, and so on. This is one of my largest concerns about AI after alignment.
I am most inclined to intervene via "paternalistic" restrictions on some classes of binding commitments that might otherwise be facilitated by AI. (People often talk about this concern in the context of totalitarianism, whereas that seems like a small minority of the risk to me / it's not really clear whether a totalitarian society is better or worse on this particular axis than a global democracy.)
We’ve talked about this a few times but I still don’t really feel like there’s much empirical support for the kind of permanent backsliding you’re concerned about being widespread.
I'm not claiming direct empirical support for permanent backsliding. That seems hard to come by, given that we can't see into the far future. I am observing quite severe current backsliding. For example, explicit ad hominem attacks, as well as implicitly weighing people's ideas/arguments/evidence differently, based on things like the speaker's race and sex, have become the norm in local policy discussions around these parts. AFAICT, this originated from academia, under "standpoint epistemology" and related ideas.
On the other side of the political spectrum, several people close to me became very sure that "the election was stolen" due to things like hacked Dominion machines and that the military and/or Supreme Court was going to intervene in favor of Trump (to the extent that it was impossible for me to talk them out of these conclusions). One of them, who I had previously thought was smart/sane enough to entrust a great deal of my financial resources with, recently expressed concern for my life because I was going to get the COVID vaccine.
Is this an update for you, or have you already observed such things yourself or otherwise known how bad things have become?
There are some fuzzy borders here, and unclarity about how to define the concept, but maybe I’d guess 10% from “easy” failures to deliberate (say those that could be avoided by the wisest existing humans and which might be significantly addressed, perhaps cut in half, by competitive discipline) and a further 10% from “hard” failures (most of which I think would not be addressed by competition).
Given these numbers, it seems that you're pretty sure that almost everyone will eventually "snap out of" any bad ideas they get talked into, or they talk themselves into. Why? Is this based on some observations you've made that I haven't seen, or history that you know about that I don't? Or do you have some idea of a mechanism by which this "snapping out of" happens?
Here's an idea of how random drift of epistemic norms and practices can occur. Beliefs (including beliefs about normative epistemology) function in part as a signaling device, similar to clothes. (I forgot where I came across this idea originally, but a search produced a Robin Hanson article about it.) The social dynamics around this kind of signaling produces random drift in epistemic norms and practices, similar to random drift in fashion / clothing styles. Such drift coupled with certain kinds of competition could have produced the world we have today (i.e., certain groups happened upon especially effective norms/practices by chance and then spread their influence through competition), but may lead to disaster in the future in the absence of competition, as it's unclear what will then counteract future drift that will cause continued deterioration in epistemic conditions.
Another mechanism for random drift is technological change that disrupts previous epistemic norms/practices without anyone specifically intending to. I think we've seen this recently too, in the form of, e.g., cable news and social media. It seems like you're envisioning that future humans will deliberately isolate their deliberation from technological advances (until they're ready to incorporate those advances into how they deliberate), so in that scenario perhaps this form of drift will stop at some point, but (1) it's unclear how many people will actually decide to do that, and (2) even in that scenario there will still be a large amount of drift between the recent past (when epistemic conditions still seemed reasonably ok, although I had my doubts even back then), which (together with other forms of drift) might never be recovered from.
As another symptom what's happening (the rest of this comment is in a "paste" that will expire in about a month, to reduce the risk of it being used against me in the future)
I'm curious about how this interacts with space colonisation. The default path of efficient competition would likely lead to maximally fast space-colonisation, to prevent others from grabbing it first. But this would make deliberating together with other humans a lot trickier, since some space ships would go to places where they could never again communicate with each other. For things to turn out ok, I think you either need:
I'm curious wheter you're optimistic about any of these options, or if you have something else in mind.
(Also, all of this assumes that defensive capabilities are a lot stronger than offensive capabilities in space. If offense is comparably strong, than we also have the problem that the cosmic commons might be burned in wars if we don't pause or reach some other agreement before space colonisation.)
I think I'm basically optimistic about every option you list.
(Also, all of this assumes that defensive capabilities are a lot stronger than offensive capabilities in space. If offense is comparably strong, than we also have the problem that the cosmic commons might be burned in wars if we don't pause or reach some other agreement before space colonisation.)
This seems like maybe the most likely single reason you need to sort everything out in advance, though the general consideration in favor of option value (and waiting a year or two being no big deal) seems even more important. I do expect to have plenty of time to do that.
I haven't thought about any of these details much because it seems like such an absurdly long subjective time before we leave the solar system, and so there will be huge amounts of time for our descendants to make bargains before them. I am much more concerned about destructive technologies that require strong coordination long before we leave. (Or about option value lost by increasing the computational complexity of your simulation and so becoming increasingly uncorrelated with some simulators.)
One reason you might have to figure these things out in advance is if you try to decouple competition from deliberation by doing something like secure space rights (i.e. binding commitments to respect property rights, have no wars ever, and divide up the cosmos in an agreeable way). It's a bit hard to see how we could understand the situation well enough to reach an agreeable compromise directly (rather than defining a mutually-agreeable deliberative process to which we will defer and which has enough flexibility to respond to unknown unknowns about colonization dynamics) but if it was a realistic possibility then it might require figuring a lot of stuff out sooner rather than later.
Thanks, computer-speed deliberation being a lot faster than space-colonisation makes sense. I think any deliberation process that uses biological humans as a crucial input would be a lot slower, though; slow enough that it could well be faster to get started with maximally fast space colonisation. Do you agree with that? (I'm a bit surprised at the claim that colonization takes place over "millenia" at technological maturity; even if the travelling takes millenia, it's not clear to me why launching something maximally-fast – that you presumably already know how to build, at technological maturity – would take millenia. Though maybe you could argue that millenia-scale travelling time implies millenia-scale variance in your arrival-time, in which case launching decades or centuries after your competitors doesn't cost you too much expected space?)
If you do agree, I'd infer that your mainline expectation is that we succesfully enforce a worldwide pause before mature space-colonisation; since the OP suggests that biological humans are likely to be a significant input into the deliberation process, and since you think that the beaming-out-info schemes are pretty unlikely.
(I take your point that as far as space-colonisation is concerned; such a pause probably isn't strictly necessary.)
I agree that biological human deliberation is slow enough that it would need to happen late.
By "millennia" I mostly meant that traveling is slow (+ the social costs of delay are low, I'm estimating like 1/billionth of value per year of delay). I agree that you can start sending fast-enough-to-be-relevant ships around the singularity rather than decades later. I'd guess the main reason speed matters initially is for grabbing resources from nearby stars under whoever-gets-their-first property rights (but that we probably will move away from that regime before colonizing).
I do expect to have strong global coordination prior to space colonization. I don't actually know if you would pause long enough for deliberation amongst biological humans to be relevant. So on reflection I'm not sure how much time you really have as biological humans. In the OP I'm imagining 10+ years (maybe going up to a generation) but that might just not be realistic.
Probably my single best guess is that some (many?) people would straggle out over years or decades (in the sense that relevant deliberation for controlling what happens with their endowment would take place with biological humans living on earth), but that before that there would be agreements (reached at high speed) to avoid them taking a huge competitive hit by moving slowly.
But my single best guess is not that likely and it seems much more likely that something else will happen (and even that I would conclude that some particular other thing is much more likely if I thought about it more).
Current human deliberation and discourse are strongly tied up with a kind of resource gathering and competition, and because of this I don't have a good picture of how things will look after the two are decoupled, nor know how to extrapolate past performance (how well human deliberation worked in the past and present) into this future.
Currently, people's thinking and speech are in large part ultimately motivated by the need to signal intelligence, loyalty, wealth, or other "positive" attributes, which help to increase one's social status and career prospects, and attract allies and mates, which are of course hugely important forms of resources, and some of the main objects of competition among humans.
Once we offload competition to AI assistants, what happens to this motivation behind discourse and deliberation, and how will that affect discourse and deliberation itself? Can you say more about what you envision happening in your scenario, in this respect?
Planned summary for the Alignment Newsletter:
Under a [longtermist](https://forum.effectivealtruism.org/tag/longtermism) lens, one problem to worry about is that even after building AI systems, humans will spend more time competing with each other rather than figuring out what they want, which may then lead to their values changing in an undesirable way. For example, we may have powerful persuasion technology that everyone uses to persuade people to their line of thinking; it seems bad if humanity’s values are determined by a mix of effective persuasion tools, especially if persuasion significantly diverges from truth-seeking.
One solution to this is to coordinate to _pause_ competition while we deliberate on what we want. However, this seems rather hard to implement. Instead, we can at least try to _decouple_ competition from deliberation, by having AI systems acquire <@flexible influence@>(@The strategy-stealing assumption@) on our behalf (competition), and having humans separately thinking about what they want (deliberation). As long as the AI systems are competent enough to shield the humans from the competition, the results of the deliberation shouldn’t depend too much on competition, thus achieving the desired decoupling.
The post has a bunch of additional concrete details on what could go wrong with such a plan that I won’t get into here.
I think this exchange between Paul Christiano (author) and Wei Dai (commenter) is pretty important food for thought, for anyone interested in achieving a good future in the long run, and for anyone interested in how morality and society evolve more generally.
I view intent alignment as one step towards a broader goal of decoupling deliberation from competition.
Competition pushes us to become the kind of people and communities who can win a fight, to delegate to whichever kind of AI is available first, and to adopt whatever ideologies are most memetically fit.
Deliberation pushes us to become the kind of people and communities who we want to be, to delegate only when we trust an AIs judgment more than our own, and to adopt views that we really believe.
I think it’s likely that competition is going to accelerate and become more complex over the next 100 years, especially as AI systems begin to replace humans and compete on our behalf. I’m afraid that this may derail human deliberation and lead us to a place we don’t want to go.
Decoupling
I would like humans and humanity to have the time, space, and safety to grow and change in whatever way we decide — individually and collectively — that we want to.
You could try to achieve this by “pausing” competition. Alice and Bob could agree to stop fighting while they try to figure out what they want and work out their disagreements. But that’s a tall order — it requires halting not only military conflict, but any economic development that could put someone at an advantage later on. I don’t want to dismiss this kind of ambitious goal (related post), but I think it’s uncertain and long-term enough that you probably want a stop-gap solution.
An alternative approach is to “decouple” competition from deliberation. Alice and Bob keep competing, but they try to make sure that deliberation happens independently and the result isn’t affected by competition. (“Pausing” is the special case of decoupling where deliberation finishes before competition starts.)
In a world without AI, decoupling is possible to a limited extent. Alice and Bob can spend time competing while planning to deliberate later after the dust has settled(or have their descendants deliberate). But it’s inevitable that Alice and Bob will be different after competing with each other for many years, and so they are not completely decoupled.
Alignment and decoupling
Aligned AI may eventually make decoupling much easier. Instead of Alice and Bob competing directly, they may delegate to AI systems who will make money and fight wars and keep them safe. Once Alice and Bob have a clearer sense of what they want, they can direct their AI to use its influence appropriately. (This is closely related to the strategy stealing assumption.)
Eventually it doesn’t even matter if Alice and Bob participate in the competition themselves, since their personal contribution would be so small relative to their AIs. At that point it’s easy for Alice and Bob to spend their time deliberating instead of thinking about competition at all.
If their AI systems are competent enough to keep them safe and isolate them from the fallout from competition, then the outcome of their deliberation doesn’t depend much on the competition occurring in the background.
Misalignment and coupling
Misaligned AI could instead introduce a severe coupling. In the worst case, my best strategy to compete is to build and empower AI systems who want to compete, and my AI also ends up competing with me in the long run.
In the catastrophe scenario, we have relatively little control over how our society’s values evolve— we end up pursuing whatever kinds of goals the most competent AI systems typically pursue.
Discussions of alignment often drift to questions like “But what do we really want?” or “how do we handle humanity’s diverse and sometimes-conflicting desires?”
Those questions seem important and challenging, but I think it’s clear that the answers shouldn’t depend on whatever values are easiest to give AI. That is, we want to decouple the question “what should we do in light of uncertainty and disagreement?” from the question “what is the most effective AI design for making money?”
Appendix: a bunch of random thoughts
Persuasion and limits of decoupling
Persuasion often doesn’t fit cleanly into “deliberation” or “competition.”
On the one hand, talking to people is a critical part of deliberation:
On the other hand, the exact same kinds of interaction give scope for competition and manipulation:
Wei Dai has talked about many of these issues over the years on Less Wrong and this section is largely inspired by his comments or conversations with him.
I don’t think intent alignment addresses this problem, and I’m not sure there’s any clean answer. Some possible approaches:
Overall I expect this to be messy. It’s a place where I don’t fully expect it to be possible to fully decouple competition and deliberation, and I wish we had a better story about how to deliberate well in light of that.
Politics of decoupling
Although I think “Decoupling deliberation and competition” is a broadly desirable goal, any implementation will likely benefit some people at others’ expense (like many other efforts to improve the world). So I don’t ever expect it to be a politically clean project.
For example:
A double-edged sword
I think that competition currently serves an important sanity-check on our deliberation, and getting rid of it is scary (even if I’m excited on balance).
In an idealized decoupling, the resources someone ends up with don’t depend at all on how they deliberate. This can result in dedicating massive resources to projects that no one really likes. For example:
I reasonably often find myself grateful that some dysfunctional norms or epistemic practices will most likely become obsolete. It’s a bit scary to think about a world where the only solution is waiting for someone to snap out of it.
Competition isn’t a robust safeguard, and it certainly isn’t optimal. A careful deliberator would make early steps to ensure that their deliberation had the same kind of robustness conferred by competition — for example they would be on the lookout for any places where their choices would lead to them getting outcompeted “in the wild” and then think carefully about whether they endorse those choices anyway. But I’m afraid that most of us are below the bar where paternalistically forcing us to “keep ourselves honest” is valuable.
I don’t really have a settled view on these questions. Overall I still feel comfortable with decoupling, but I hope that we can collectively decide on some regime that captures some of the benefits of this kind of “competitive discipline” without the costs. For example, even in a mostly-decoupled world we could end up agreeing on different domains of “safe” competition (e.g. it feels much better for states to compete on “being a great place to live” than to fight wars), or imposing temporary paternalistic restrictions and relaxing them only once some reasonably high bar of competence is demonstrated.
The balance of power affects deliberation
Negotiation and compromise is an important part of deliberation, but it depends on the current state of competition.
Suppose that Alice and Bob start talking while they don’t know who is going to end up more influential. But they are talking slowly, in the way most comfortable to them, while competition continues to accelerate at the maximum possible rate. So before they can reach any agreement, it may be clear that one of them is vastly more influential.
Alice and Bob would have preferred to make an early agreement to treat each other with respect, back when they were both ignorant about who would end up with the power. But this opportunity is lost forever once they have seen the outcome.
Alice and Bob can try to avoid this by quickly reaching a compromise. But that seems hard, and having to make a precise agreement fast may take them far away from the deliberative process they would have preferred.
I don’t have any real story about coping with this problem, though I’m less worried about it than persuasion. Some possible (but pretty weird) approaches:
The singularity, the distant future, and the “long reflection”
In some ways my picture of the future is very aggressive/unusual. For example I think that we are likely to see explosive economic growth and approximate technological maturity within the next 50–100 years (and potentially much sooner).
But in other ways it feels like I have a much more “boring” picture of the future. I expect technology could radically transform the world on a timescale that would be disorienting to people, but for the most part that’s not how we want our lives to go in order to have the best chance of reaching the best conclusions about what to do in the long run. We do want some effects of technology — we would like to stop being so hungry and sick, to have a little bit less reason to be at each other’s throats, and so on — but we also want to be isolated from the incomprehensible, and to make some changes slowly and carefully.
So I expect there to be a very recognizable thread running through humanity’s story, where many of the humans alive today just continue to being human and growing in a way that is familiar and comfortable, perhaps changing more quickly than we have in the past but never so quickly that we are at risk of losing our footing. The point of this is not because that’s how to have the best life (which may well involve incomprehensible mind-alteration or hyper-optimized virtual reality or whatever). It’s because we still have a job to do.
The fact that you are able to modify a human to be much smarter does not mean that you need to, and indeed I think it’s important that you take that process slow. The kinds of moral change we are most familiar with and trust involve a bunch of people thinking and talking, gradually refining their norms and making small changes to their nature, raising new generations one after another.
During that time we have a lot to do to safeguard the process; to become more and more comfortable that it’s proceeding in a good direction even as we become wiser and wiser; to do lots of moral philosophy and political philosophy and psychology at every stage in case they provide clues about how to take the next step wisely. We can take the things that scare us or that we dislike about ourselves, and we can very gingerly remove or change them piece by piece. But I think it doesn’t have to be nearly as weird as people often imagine it.
Moreover, I think that the community of humans taking things slowly and living recognizable lives isn’t an irrelevant sideshow that anyone serious would ignore in favor of thinking about the crazy stuff AI is doing “out there” (or the hyper-optimized experiences some of our descendants may immerse themselves in). I think there’s a real sense in which it’s the main thread of the human story; it’s the thread that determines our future and gradually expands to fill the universe.
Put differently, I think people sometimes imagine abdicating responsibility to crazy AI systems that humans build. I think that will happen someday, but not when we can first build AI — indeed, it won’t happen until those AI systems no longer seem crazy.
In the weirdest cases, we decouple by building an AI that merely needs to think about what humans would want rather than deferring to any real flesh-and-blood humans. But even those cases are more like a change than an ending — we pack up our things from Earth and continue our story inside a homey simulation. And personally I don’t expect to do even that until everyone is good and ready for it, many years after it first becomes possible.