TL;DR: transformative AI(TAI) plausibly requires causal models of the world. Thus, a component of AI safety is ensuring secure paths to generating these causal models. We think the lens of causal models might be undervalued within the current alignment research landscape and suggest possible research directions.
This post was written by Marius Hobbhahn and David Seiler. MH would like Richard Ngo for encouragement and feedback.
If you think these are interesting questions and want to work on them, write us. We will probably start to play around with GPT-3 soonish. If you want to join the project, just reach out. There is certainly stuff we missed. Feel free to send us references if you think they are relevant.
There are already a small number of people working on causality within the EA community. They include Victor Veitch, Zhijing Jin and PabloAMC. Check them out for further insights. There are also other alignment researchers working on causal influence diagrams (authors: Tom Everitt, Ryan Carey, Lewis Hammond, James Fox, Eric Langlois, and Shane Legg) whose work is very much related.
Just to get this out of the way: we follow a broad definition of causality, i.e. we assume it can be learned from (some) data and doesn’t have to be put into the model by humans. Furthermore, we don’t think the representation has to be explicit, e.g. in a probabilistic model, but could be represented in other ways, e.g. in the weights of neural networks.
But what is it? In a loose sense, you already know: things make other things happen. When you touch a light switch and a light comes on, that’s causality. There is a more technical sense in which no one understands causality, not even Judea Pearl (where does causal information ultimately come from if you have to make causal assumptions to get it? For that matter, how do we get variables out of undifferentiated sense data?). But it's possible to get useful results without understanding causality precisely, and for our purposes, it's enough to approach the question at the level of causal models.
Concretely: you can draw circles around phenomena in the world (like "a switch" and "a lightbulb") to make them into nodes in a graph, and draw arrows between those nodes to represent their causal relationships (from the switch to the lightbulb if you think the switch causes the lightbulb to turn on, or from the lightbulb to the switch if you think it's the other way around).
There’s an old Sequences post that covers the background in more detail. The key points for practical purposes are that causal models:
Causal, compared to correlational, information has two main advantages. For the following section, I got help from a fellow Ph.D. student.
Markov factorization: Mathematically speaking, Markov factorization ensures conditional independence between some nodes given other nodes. In practice, this means that we can write a joint probability distribution as a sparse graph where only some nodes are connected if we assume causality. It introduces sparsity.
“Namely, if we have a joint with n binary random variables, it would have 2^n - 1 independent parameters (the last one is determined to make the sum equal to 1). If we have k factors with n/k variables each, then we would have k(2^(n/k) - 1) independent parameters. For n=20 and k=4, the numbers are 1048576 vs. 124.” - Patrik Reizinger
Independent Mechanisms: the independent mechanisms principle ensures that factors do not influence each other. Therefore, if we observe shifts in our data distribution, we only need to retrain a few parts of the model. If we observe global warming, for example, the vast majority of physics stays the same. We only need to recalibrate some parts of our model that relate to temperature and climate. Another example is the lightbulb blackout scenario from above. If you know there is a blackout, you don't need to flip the switch to know that the light won't turn on.
The conclusion of these two statements is that correlational models assume a lot more relations between variables than causal models and the entire model needs to be retrained every time the data changes. In causal models, however, we usually only need to retrain a small number of mechanisms. Therefore, causal models are much more sample efficient than correlational ones.
Causal models introduce a very strong assumption on the model. Namely, variables are not just related, they are related in a directed way. Thus, causal models imply a testable hypothesis. If our causal model is that taking a specific drug reduces the severity of a disease, then we can test this with an RCT. So our model, drug -> disease, is a falsifiable hypothesis.
The same thing is not possible for correlational models. If we say the intake of drugs correlates with the severity of the disease we say that either the drug helps with the disease, people who have less severe diseases take more drugs or both depend on a third variable. As soon as we intervene by fixing one variable and observing the other, we have already made a causal assumption.
Correlational knowledge can still be used for actions--you can still take the drug and hope the causal arrow goes in the right direction. But it could also have a different effect than desired since you don’t know which variable is the cause and which one is the effect.
Causal models greatly improve the ability of models to make decisions and interact with their environment. Therefore we think it is highly plausible that transformative AI will have some causal model of the world. Due to the rise of data-driven learning, we expect this model to be learned from data, but we could also imagine some human interference or inductive biases.
Overall, we think that the thesis that causality matters for TAI is not very controversial but we think there are a lot of implications for AI safety that are not yet fully explored.
If the causal models in ML algorithms have a large effect on their actions/predictions, we should really understand how they work. Some considerations include:
We ask a lot of questions but don’t have many answers. Thus, we think the highest priority is to get a clearer picture, e.g. refine the questions, translate them into testable hypotheses and read more work from other scientists working on causality.
We think that reasonable first steps could be:
If you think these are interesting questions and want to work on them, reach out. We will probably start to play around with GPT-3 soon. There is certainly research we missed. Feel free to send us references if you think they are relevant.
We don’t want this to be another piece along the lines of “AI truly needs X to be intelligent” where X might be something vague like understanding/creativity/etc. We have the hunch that causality might play a role in transformative AI and feel like it is currently underrepresented in the AI safety landscape. Not more, not less.
Furthermore, we don’t need a causal model of everything. Correlations are often sufficient. For example, if you hear an alarm, you don’t need to know exactly what caused the alarm to be cautious. But knowing whether the alarm was caused by fire or by an earthquake will determine what the optimal course of action is.
So we don’t think humans need to have a causal model of everything and neither do AIs but at least for safety-relevant applications, we should look into it deeper.
Causality might be one interesting angle for AI safety but certainly not the only one. However, there are a ton of people in classic ML who think that causality is the missing piece to AGI. They could be completely wrong but we think it’s at least worth exploring from an AI safety lens.
In this post, we outlined why causality might be relevant for TAI, which kind of questions might be relevant and how we could start answering them.
Some people will see our definition as naive and undercomplex. Maybe there is no such thing as causality and it’s all just different shades of correlation. Maybe all causal models are wrong and humans see something that isn’t. Maybe, maybe, maybe.
Similar to how there is no hard evidence for consciousness and philosophical zombies that act just as if they were conscious but truly aren't could exist, all causal claims could also be explained with a lot of correlations and luck. But as argued, e.g. by Eliezer, Occam's razor would make the existence of some sort of consciousness much more likely than its absence and by the same logic causality more likely than its absence.