By Tom Everitt, Lewis Hammond, Rhys Ward, Ryan Carey, James Fox, Sebastian Benthall, Matt MacDermott and Shreshth Malik representing the Causal Incentives Working Group. Thanks also to Toby Shevlane, MH Tessler, Aliya Ahmad, Zac Kenton, Maria Loks-Thompson, and Alexis Bellot.
Over the next few years, society, organisations, and individuals will face a number of fundamental questions stemming from the rise of advanced AI systems:
A causal perspective on agency provides conceptual tools for navigating the above questions, as we’ll explain in this sequence of blog posts. An effort will be made to minimise and explain jargon, to make the sequence accessible to researchers from a range of backgrounds.
First, with agent we mean a goal-directed system that acts as if it it is trying to steer the world in some particular direction(s). Examples include animals, humans, and organisations (more on agents in a subsequent post). Understanding agents is key to the above questions. Artificial agents are widely considered the primary existential threat from AGI-level technology, whether they emerge spontaneously or through deliberate design. Despite the myriad risks to our existence, highly capable agents pose a distinct danger, because many goals can be achieved more effectively by accumulating influence over the world. Whereas an asteroid moving towards earth isn’t intending to harm humans and won’t resist redirection, misaligned agents might be distinctly adversarial and active threats.
Second, the preservation of human agency is critical in the approaching technological transition, for both individuals and collectives. Concerns have already been raised that manipulative social media algorithms and content recommenders undermine users’ ability to focus on their long-term goals. More powerful assistants could exacerbate this. And as more decision-making is delegated to AI systems, the ability of society to set its own trajectory comes into question
Human agency can also be nurtured and protected. Helping people to help themselves is less paternalistic than directly fulfilling their desires, and fostering empowerment may be less contingent on complete alignment than direct satisfaction of individual preferences. Indeed, self-determination theory provides evidence that humans intrinsically value agency, and some human rights can be interpreted as “protections of our normative agency”.
Third, artificial agents might themselves eventually constitute moral patients. A clearer understanding of agency could help us refine our moral intuitions and avoid unethical actions. Some ethical dilemmas might be possible to avoid altogether by only designing artificial systems that lack moral patienthood.
One hope for our research is that it would build up a theory of agency. Such a theory would ideally answer questions such as:
Causality is helpful for understanding agents. Philosophers have been interested in causality for a long time, not just because the exact relationship between a cause and an effect is intellectually intriguing, but because it underpins so many other concepts, many of which are relevant to understanding agents and designing safe AGI.
For example, both influence and response are causal concepts. We want agents that influence the world in positive ways, and respond appropriately to instructions. A range of other other relevant concepts also build on causality:
The tree of causality
The rest of this sequence will explain in more detail how these concepts are grounded in causality, and the research this has led to. We hope this will enable and motivate other researchers to join our effort of building a formal theory of safe A(G)I based on causal foundations. Much of our recent work fits into this vision. For example, in discovering agents and reasoning about causality in games, we developed a better understanding of how to represent various aspects of reality with causal models. With the agent incentives paper, we showed how such models can be analysed to reveal safety-relevant properties. And with path-specific objectives, we illustrated how this kind of analysis can inspire improved designs.
We hope this will complement other research directions crucial to safe AGI, like scalable alignment, dangerous capability evaluations, robustness, interpretability, ethics, policy and governance, forecasting, agent foundations, and risk mapping.
We hope that a causality-based understanding of agency and related aspects will help designers of AI systems by clarifying the space of possibilities for agents, and how to avoid especially risky configurations. It may help regulators with a better picture of what to look out for, and what should count as sufficient evidence of safety. It may help us all decide what behaviour is acceptable towards what kinds of systems. And finally, but not least, it may help individuals understand what it is that they seek to preserve and enhance in their interactions with artificially intelligent systems.
In the next post, we explain causality and causal models in more detail, covering Pearl’s different causal models, and how they can be generalised to account for the presence of one or more agents.