Do you have a mostly disjoint view of AI capabilities between the "extinction from loss of control" scenarios and "extinction by industrial dehumanization" scenarios? Most of my models for how we might go extinct in next decade from loss of control scenarios require the kinds of technological advancement which make "industrial dehumanization" redundant, with highly unfavorable offense/defense balances, so I don't see how industrial dehumanization itself ends up being the cause of human extinction if we (nominally) solve the control problem, rather than a deliberate or accidental use of technology that ends up killing all humans pretty quickly.
Separately, I don't understand how encouraging human-specific industries is supposed to work in practice. Do you have a model for maintaining "regulatory capture" in a sustained way, despite having no economic, political, or military power by which to enforce it? (Also, even if we do succeed at that, it doesn't sound like we get more than the Earth as a retirement home, but I'm confused enough about the proposed equilibrium that I'm not sure that's the intended implication.)
Do you have a mostly disjoint view of AI capabilities between the "extinction from loss of control" scenarios and "extinction by industrial dehumanization" scenarios?
a) If we go extinct from a loss of control event, I count that as extinction from a loss of control event, accounting for the 35% probability mentioned in the post.
b) If we don't have a loss of control event but still go extinct from industrial dehumanization, I count that as extinction caused by industrial dehumanization caused by successionism, accounting for the additional 50% probability mentioned in the post, totalling an 85% probability of extinction over the next ~25 years.
c) If a loss of control event causes extinction via a pathway that involves industrial dehumanization, that's already accounted for in the previous 35% (and moreovever I'd count the loss of control event as the main cause, because we have no control to avert the extinction after that point). I.e., I consider this a subset of (a): extinction via industrial dehumanization caused by loss of control. I'd hoped this would be clear in the post, from my use of the word "additional"; one does not generally add probabilities unless the underlying events are disjoint. Perhaps I should edit to add some more language to clarify this.
Do you have a model for maintaining "regulatory capture" in a sustained way
Yes: humans must maintain power over the economy, such as by sustaining the power (including regulatory capture power) of industries that care for humans, per the post. I suspect this requires involves a lot of technical, social, and sociotechnical work, with much of the sociotechnical work probably being executed or lobbied by industry, and being of greater causal force than either the purely technical (e.g., algorithmic) or purely social (e.g., legislative) work.
The general phenomenon of sociotechnical patterns (e.g., product roll-outs) dominating the evolution of the AI industry can be seen in the way Chat-GPT4 as a product has had more impact on the world — including via its influence on subsequent technical and social trends — than technical and social trends in AI and AI policy prior to ChatGPT-4 (e.g., papers on transformer models; policy briefings and think tank pieces on AI safety).
Do you have a model for maintaining "regulatory capture" in a sustained way, despite having no economic, political, or military power by which to enforce it?
No. Almost by definition, humans must sustain some economic or political power over machines to avoid extinction. The healthy parts of the healthcare industry are an area where humans currently have some terminal influence, as its end consumers. I would like to sustain that. As my post implies, I think humanity has around a 15% chance of succeeding in that, because I think we have around an 85% chance of all being dead by 2050. That 15% is what I am most motivated to work to increase and/or prevent decreasing, because other futures do not have me or my human friends or family or the rest of humanity in them.
Most of my models for how we might go extinct in next decade from loss of control scenarios require the kinds of technological advancement which make "industrial dehumanization" redundant,
Mine too, when you restrict to the extinction occuring (finishing) in the next decade. But the post also covers extinction events that don't finish (with all humans dead) until 2050, even if they are initiated (become inevitable) well before then. From the post:
First, I think there's around a 35% chance that humanity will lose control of one of the first few AGI systems we develop, in a manner that leads to our extinction. Most (80%) of this probability (i.e., 28%) lies between now and 2030. In other words, I think there's around a 28% chance that between now and 2030, certain AI developments will "seal our fate" in the sense of guaranteeing our extinction over a relatively short period of time thereafter, with all humans dead before 2040.
[...]
Aside from the ~35% chance of extinction we face from the initial development of AGI, I believe we face an additional 50% chance that humanity will gradually cede control of the Earth to AGI after it's developed, in a manner that leads to our extinction through any number of effects including pollution, resource depletion, armed conflict, or all three. I think most (80%) of this probability (i.e., 44%) lies between 2030 and 2040, with the death of the last surviving humans occurring sometime between 2040 and 2050. This process would most likely involve a gradual automation of industries that are together sufficient to fully sustain a non-human economy, which in turn leads to the death of humanity.
If I intersect this immediately preceding narrative with the condition "all humans dead by 2035", I think that most likely occurs via (a)-type scenarios (loss of control), including (c) (loss of control leading to industrial dehumanization), rather than (b) (successionism leading to industrial dehumanization).
What does your company do, specifically? I found the brief description at HealthcareAgents.com vague and unclear. Can you walk me through an example case of what you do for a patient, or something?
A patient can hire us to collect their medical records into one place, to research a health question for them, and to help them prep for a doctor's appointment with good questions about the research. Then we do that, building and using our AI tool chain as we go, without training AI on sensitive patient data. Then the patient can delete their data from our systems if they want, or re-engage us for further research or other advocacy on their behalf.
A good comparison is the company Picnic Health, except instead of specifically matching patients with clinical trials, we do more general research and advocacy for them.
I wonder if work on AI for epistemics could be great for mitigating the "gradually cede control of the Earth to AGI" threat model. A large majority of economic and political power is held by people who would strongly oppose human extinction, so I expect that "lack of political support for stopping human extinction" would be less of a bottleneck than "consensus that we're heading towards human extinction" and "consensus on what policy proposals will solve the problem". Both of these could be significantly accelerated by AI. Normally, one of my biggest concerns about "AI for epistemics" is that we might not have much time to get good use of the epistemic assistance before the end — but if the idea is that we'll have AGI for many years (as we're gradually heading towards extinction) then there will be plenty of time.
Health itself is an inspiring concept at a technical level, because it is meaningful at many scales of organization at once: healthy cells, healthy organs, healthy people, healthy families, healthy communities, healthy businesses, healthy countries, and (dare I say) healthy civilizations all have certain features in common, to do with self-sustenance, harmony with others, and flexible but functional boundaries.
Healthcare in this general sense is highly relevant to machines. Conversely, sufficient tech to upload/backup/instantiate humans makes biology-specific healthcare (including life extension) mostly superfluous.
The key property of machines is initial advantage in scalability, which quickly makes anything human-specific tiny and easily ignorable in comparison, however you taxonomize the distinction. Humans persevere only if scalable machine sources of power (care to) lend us the benefits of their scale. Intent alignment for example would need to be able to harness a significant fraction of machine intent (rather than being centrally about human intent).
I'd strongly bet that when you break this down in more concrete detail, a flaw in your plan will emerge.
The balance of industries serving humans vs. AI's is a suspiciously high level of abstraction.
This post starts out pretty gloomy but ends up with some points that I feel pretty positive about. Day to day, I'm more focussed on the positive points, but awareness of the negative has been crucial to forming my priorities, so I'm going to start with those. I'm mostly addressing the EA community here, but hopefully this post will be of some interest to LessWrong and the Alignment Forum as well.
Part one — My main concerns
I think AGI is going to be developed soon, and quickly. Possibly (20%) that's next year, and most likely (80%) before the end of 2029. These are not things you need to believe for yourself in order to understand my view, so no worries if you're not personally convinced of this.
(For what it's worth, I don't expect to change my mind about the above AGI forecast in response to debate. That's because I feel sufficiently clear in my understanding of the various ways AGI could be developed from here, such that the disjunction of those possibilities adds up to a pretty high level of confidence in AGI coming soon, which is not much affected by who agrees with me about it. Also, I'm not really deferring to others about it, so I'm pretty confident the above forecast is not the result of any "echo chamber" or "pure hype" effects. My views here came through years of study and research in AI, combined with over a decade of private forecasting practice starting in 2010 — including a lot of hype-detection and bullshit detection practice — which I don't think can be succinctly conveyed in a blog post.)
I also currently think there's around a 15% chance that humanity will survive through the development of artificial intelligence. In other words, I think there's around an 85% chance that we will not survive the transition. Many factors affect this probability, so please take this as a conditional forecast that I'd like you to change if you can, rather than taking it as some unavoidable fate that humanity has no power to decide upon. With that said, I do have reasons for the number 85% being so high.
First, I think there's around a 35% chance that humanity will lose control of one of the first few AGI systems we develop, in a manner that leads to our extinction. Most (80%) of this probability (i.e., 28%) lies between now and 2030. In other words, I think there's around a 28% chance that between now and 2030, certain AI developments will "seal our fate" in the sense of guaranteeing our extinction over a relatively short period of time thereafter, with all humans dead before 2040.
The main factor that I think could reduce this loss-of-control risk is government regulation that is flexible in allowing a broad range of AI applications while rigidly prohibiting uncontrolled intelligence explosions in the form of fully automated AI research and development.
This category of extinction event, involving a concrete loss-of-control event, is something I believe is no longer neglected within the EA community compared to when I first began focussing on it in 2010, and so it's not something I'm going to spend much time elaborating on.
What I think is neglected within EA is what happens to human industries after AGI is first developed, assuming we survive that transition.
Aside from the ~35% chance of extinction we face from the initial development of AGI, I believe we face an additional 50% chance that humanity will gradually cede control of the Earth to AGI after it's developed, in a manner that leads to our extinction through any number of effects including pollution, resource depletion, armed conflict, or all three. I think most (80%) of this probability (i.e., 40%) lies between 2030 and 2040, with the death of the last surviving humans occurring sometime between 2040 and 2050. This process would most likely involve a gradual automation of industries that are together sufficient to fully sustain a non-human economy, which in turn leads to the death of humanity.
Extinction by industrial dehumanization
This category of extinction process — which is multipolar, gradual, and effectively consensual for at least a small fraction of humans — is not something I believe the EA community is taking seriously enough. So I'm going to elaborate on it here. In broader generality, it's something I've written about previously with Stuart Russell in TASRA. I've also written about it on LessWrong, in "What Multipolar Failure Looks Like", with the following diagram depicting the minimal set of industries needed to fully eliminate humans from the economy, both as producers and as consumers:
The main factor that I think could avoid this kind of industrial dehumanization is if humanity coordinates on a global scale to permanently prioritize the existence of industries that specifically serve humans and not machines — industries like healthcare, agriculture, education, and entertainment — and to prevent the hyper-competitive economic trends that AGI would otherwise unlock. Essentially, I'm aiming to achieve and sustain regulatory capture on the part of humanity as a special interest group relative to machines. Preserving industries that specifically care for humans means (a) maintaining vested commercial interests in policies that keep humans alive and well, and (b) ensuring that these industries extract adequate gains from the AI revolution over the next 5 years or so, thus radically increasing the collective capacity of the human species, enough to keep pace with machines so that we don't go "out with a whimper".
(Later in this post I'll elaborate on how I'm hoping we humans can better prioritize human-specific industries, and why I'm especially excited to work in healthtech.)
The reason I expect human extinction to result from industrial dehumanization in a post-AGI economy is that I expect a significant but increasingly powerful fraction of humans to be okay with that. Like, I expect 1-10% of humans will gradually and willfully tolerate the dehumanization of the global economy, in a way that empowers that fraction of humanity throughout the dehumanization process until they themselves are also dead and replaced by AI systems.
Successionism as a driver of industrial dehumanization
For lack of a better term, I'll call the attitude underlying this process successionism, referring to the acceptance of machines as a successor species replacing humanity. I don't just mean accepting that AI will constitute one or more new species; I mean foreseeing that those species will lead to human extinction during our lifetimes, and accepting that.
There are a variety of different attitudes that can lead to successionism. For instance:
Taken together, these various sources of successionism have a substantial potential to steer economic activities, both overtly and covertly. And, they can reinforce and/or cover for each other, in the formation of temporary alliances that advance or use AI in ways that risk or cause harm to humanity. Successionist AI developers don't even have to say which kind of succesionist they are in order to work together toward a successionist future.
Also, while the AI systems involved in an industrial dehumanization process may not be "aligned with humanity" in the sense of keeping us all around and happily in control of our destinies, the AI very well may be "aligned" in the sense of obeying successionist creators or users, who do not particularly care about humanity as a whole, and perhaps do not even prioritize their own survival very much.
One reason I'm currently anticipating this trend in the future is that I have met a surprising number of people who seem to feel okay with causing human extinction in the service of other goals. In particular I think more than 1% of AI developers feel this way, and I think maybe as high as 10% based on my personal experience from talking to hundreds of colleagues in the field, many of whom have graciously conveyed to me that they think humanity probably doesn't deserve to survive and should be replaced by AI.
The succession process would involve a major rebalancing of global industries, with a flourishing of what I call the machine economy, and a languishing of what I call the human economy. My cofounder Jaan Tallinn recently spoke about this at a United Nations gathering in New York.
Economic rebalancing away from the human economy is not addressed by technical solutions to AI obedience, because of successionist humans who are roughly indifferent or even opposed to human survival.
So, while I'm glad to see people working hard on solving the obedience problem for AI systems — which helps to address much of the first category of risk involving acute loss-of-control during the initial advent of AGI over the next few years — I remain dismayed at humanity's sustained lack of attention on how we humans can or should manage the global economy with AGI systems after they're sufficiently obedient to perform all aspects of human labor upon request.
Part Two — My theory of change
Numerous approaches make sense to me for avoiding successionism, and arguably these are all necessary or at least helpful in avoiding successionist extinction pathways:
These approaches can support each other. For example, successful businesses in (3) will have a natural motivation to advocate for regulations supporting (2) and social events fostering (1). Because I think it's more neglected and — as I will argue — potentially more powerful, I'm going to focus on (3).
Confronting successionism with human-specific industries
Currently, I think the EA movement is heavily fixated on government and technical efforts, to the point of neglecting pro-social and pro-business interventions that might even be necessary for resourceful engagement with government and tech development. In other words, EA is neglecting industrial solutions to the industrial problem of successionism.
As an example, consider the impact that AI policy efforts were having prior to ChatGPT-4, versus after. The impact of ChatGPT-4 being shipped as a product that anyone could use and benefit from *vastly outstripped* the combined efforts of everyone writing arguments and reports to increase awareness of AGI development in AI policy. That's because direct personal experience with something is so much more convincing than a logical or empirical argument, for most people, and it also creates logical common knowledge which is important for coordination.
Partly due to the EA community's (relative) disinterest in developing prosocial products and businesses in comparison to charities and government policies, I've not engaged much with the EA community over the past 6 years or so, even though I share certain values with many people in the community, including philanthropy.
However, I've recently been seeing more appreciation for "softer" (non-technical, non-governmental) considerations in AI risk coming from EA-adjacent readers, including some positive responses to a post I wrote called "Safety isn't safety without a social model". So, I thought it might make sense to try sharing more about how I wish the EA movement had a more diverse portfolio of approaches to AI risk, including industrial and social approaches.
For instance, amongst the many young people who have been inspired by EA to improve the world, I would love to see more people
Note: This does not include for-profits that grow by hurting people, such as by turning people against each other and extracting profits from the conflict. Illegal arms dealers and social media companies do this. It's much better to make the good kind of for-profits that grow by helping people. I want more of those!
Note: I've been pleased that certain EA-adjacent events I've attended over the past couple of years seem to have more of a positive vibe in this way, compared to my sense of the 2018-2022 era, which is another reason I feel more optimistic sharing this wish-list for cultural shifts that I would like to see from EA.
I suspect there can be massive flow-through effects from positive trends like these, that could help develop a healthy attitude for humanity choosing to continue its own existence and avoiding full-on successionism.
Also, the more we humans can make the world better right now, the more we can alleviate what might otherwise be a desperate dependency upon superintelligence to solve all of our problems. I think a huge amount of good can be done with the current generation of AI models, and the more we achieve that, the less compelling it will be to take unnecessary risks with rapidly advancing superintelligence. There's a flinch reaction people sometimes have against this idea, because it "feeds" the AI industry by instantiating or acknowledging more of its benefits. But I think that's too harsh of a boundary to draw between humanity and AI, and I think we (humans) will do better by taking a measured and opportunistic approach to the benefits of AI.
How I identified healthcare as the industry most relevant to caring for humans
For one thing, it's right there in the name 🙂
More systematically:
Healthcare, agriculture, food science, education, entertainment, and environmental restoration are all important industries that serve humans but not machines. These are industries I want to sustain and advance, in order to keep the economy caring for humans, and to avoid successionism and industrial dehumanization. Also, good business ideas that grow by helping people can often pay for themselves, and thus help diversify funding sources for doing more good.
So, first and foremost, if you see ideas for businesses that meaningfully contribute to any of those industries, please build them! At the Survival and Flourishing Fund we now make non-dilutive grants to for-profits (in exchange for zero equity), and I would love for us to find more good business ideas to support.
With that said, healthcare is my favorite human-specific industry to advance, for several reasons:
It's okay with me if only some of the above bets pay out, as long as my colleagues and I can make a real contribution to healthcare with AI technology, and help contribute to positive attitudes and business trends that avoid successionism and industrial dehumanization in the era of AGI.
But why not just do safety work with big AI labs or governments?
You might be wondering why I'm not working full-time with big AI labs and governments to address AI risk, given that I think loss-of-control risk is around 35% likely to get us all killed, and that it's closer in time than industrial dehumanization.
First of all, this question arguably ignores most of the human economy aside from governments and AGI labs, which should be a bit of a red flag I think, even if it's a reasonable question for addressing near-term loss-of-control risk specifically.
Second, I do still spend around 1 or 1.5 workdays per week addressing the control problem, through spurts of writing, advocacy and philanthropic support for the cause, in my work for UC Berkeley and volunteering for the Survival and Flourishing Fund. That said, it's true that I am not focusing the majority of my time on addressing the nearest term sources of AI risk.
Third, a major reason for my focus on longer-term risks on the scale of 5+ years — after I'm pretty confident that AGI will already be developed — is that I feel I've been relatively successful at anticipating tech development over the past 10 years or so, and the challenges those developments would bring. So, I feel I should continue looking 5 years ahead and addressing what I'm fairly sure is coming on that timescale.
For context, I first started working to address the AI control problem in 2012, by attempting to build and finance a community of awareness about it, and later through research at MIRI in 2015 and 2016. Around that time, I concluded that multipolar AI risks would be even more neglected than unipolar risks because they are harder to operationalize. I began looking for ways to address multipolar risks, first through research in open-source game theory, then within video game environments tailored to include caretaking relationships, and now in the real-world economy with healthcare as a focus area. And sadly it took me most of the period from 2012 to 2021 to realize that I should be working on for-profit feedback loops for effecting industrial change at a global scale, through the development of helpful products and services that can keep a growing business oriented on doing good work that helps people.
Now, in 2024, the loss-of-control problem is much more imminent but also much less neglected than when I started worrying about it, so I'm even more concerned with positioning myself and my business to address problems that might not become obvious for another 5-10 years. The potential elimination of the healthcare industry in the 2030s is one of those problems, and I want to be part of the solution to it.
Fourth, even if we (humans) fail to save the whole world, I will still find it intrinsically rewarding to help a bunch of people with their health problems between now and then. In other words, I also care about healthcare in and of itself, even if humanity might somehow destroy itself soon. This caring allows me to focus myself and my team on something positive that's enjoyable to scale up and that grows by helping people, which I consider a healthy attribute for a growing business.
Fifth and finally, overall I would like to see more ambitious young people who want to improve the world with helpful feedback loops that scale into successful businesses, because industry is a lot of what drives the world, and I want morally driven people to be driving industry.
Conclusion
In summary,
Thanks for reading about why I'm working in healthtech :)