The problem is indeed that is insufficient to compute a unique counterfactual---additional causal information is needed. Pearl's approach is to specify each observable variable as a deterministic function of its parents in the causal graph. Any uncertainty must be represented by a set of "exogenous" variables , which can feature in the functions for the observables. (See chapter 7 of Causality, or also An Axiomatic Characterization of Causal Counterfactuals.)
For example, your first process could be represented by the following causal model:
The other processes might have different structures, equations, and distributions ---it's not possible in general to distinguish these purely from the distribution .
Thank you! That sentence is what I was looking for "Any uncertainty must be represented by a set of “exogenous” variables U".
I'd been doing that, but without any theoretical justification for it.
A problem that's come up with my definitions of stratification.
Consider a very simple causal graph:
.
In this setting, A and B are both booleans, and A=B with 75% probability (independently about whether A=0 or A=1).
Suppose I now want to compute the counterfactual: suppose I assume that B=0 when A=0. What would happen if A=1 instead?
The problem is that P(B|A) seems insufficient to solve this. Let's imagine the process that outputs B as a probabilistic mix of functions, that takes the value of A and outputs that of B. There are four natural functions here:
Then one way of modelling the causal graph is as a mix 0.75f2+0.25f3. In that case, knowing that B=0 when A=0 implies that P(f2)=1, so if A=1, we know that B=1.
But we could instead model the causal graph as 0.5f2+0.25f1+0.25f1. In that case, knowing that B=0 when A=0 implies that P(f2)=2/3 and P(f0)=1/3. So if A=1, B=1 with probability 2/3 and B=1 with probability 1/3.
And we can design the node B, physically, to be one or another of the two distributions over functions or anything in between (the general formula is (0.5+x)f2+x(f3)+(0.25−x)f1+(0.25−x)f0 for 0≤x≤0.25). But it seems that the causal graph does not capture that.
Owain Evans has said that Pearl has papers covering these kinds of situations, but I haven't been able to find them. Does anyone know any publications on the subject?