I don't think one should see Pearl-type theories, which fall under the general heading of interventionist accounts, as reductive theories, i.e., as theories that reduce causal relations to something non-causal (even though Pearl might claim that his account is indeed reductive). I think such theories indeed make irreducible appeal to causal notions in explicating causal relations.
One reason why this isn't problematic is that these theories are explicating causal relations between some variables in terms of causal relations between those variables and the interventions and correlational information between the variables. So such theories are not employing causal information between the variables themselves in order to explain causal relations about them -- which would indeed be viciously circular. This point is explained clearly here.
If you want a reductive account of causation, I think that's a much harder problem, and indeed there might not even be one. See here for more details on attempts to provide reductive accounts of causation.
You can read Halpern's stuff if you want an axiomatization of something like the responses to the do-operator.
Or you can try to understand the relationship of do() and counterfactual random variables, and try to formulate causality as a missing data problem (whereby a full data distribution on counterfactuals and an observed data distribution on factuals are related via a coarsening process).
Well, first off, Pearl would remind you that reduction doesn't have to mean probability distributions. If Markov models are simple explanations of our observations, then what's the problem with using them?
The surface-level answer to your question would be to talk about how to interconvert between causal graphs and probabilities, thereby identifying any function on causal graphs (like setting the value of a node without updating its parents) with an operator on probability distributions (given the graphical model). Note that in common syntax, "conditioning" on do()-ing something means applying the operator to the probability distribution. But you can google this or find it in Pearl's book Causality.
I'd just like you to think more about what you want from an "explanation." What is it you want to know that would make things feel explained?
If Markov models are simple explanations of our observations, then what's the problem with using them?
To be clear, by total propability distribution I mean a distribution over all possible conjunctions of events. A Markov model also creates a total propability distribution, but there are multiple Markov models with the same propability distribution. Believing in a Markov model is more specific, and so if we could do the same work with just propability distributions, then Occam would seem to demand we do.
The surface-level answer to your question would...
To reductively explain causality, it has to be explained in non-causal terms, most likely in terms of total propability distributions. Pearl explains causality in terms of causal graphs which are created by conditionalizing the propability distribution on not event, but do(event). What does this mean? It's easy enough to explain in causal terms: You make it so event occurs without changing any of its causal antecedents. But of course that fails to explain causality. How could it be explained without that?