A critical agential account of free will, causation, and physics

AI ALIGNMENT FORUM
AF

A critical agential account of free will, causation, and physics — AI Alignment Forum

This is an account of free choice in a physical universe. It is very much relevant to decision theory and philosophy of science. It is largely metaphysical, in terms of taking certain things to be basically real and examining what can be defined in terms of these things.

The starting point of this account is critical and agential. By agential, I mean that the ontology I am using is from the point of view of an agent: a perspective that can, at the very least, receive observations, have cognitions, and take actions. By critical, I mean that this ontology involves uncertain conjectures subject to criticism, such as criticism of being logically incoherent or incompatible with observations. This is very much in a similar spirit to critical rationalism.

Close attention will be paid to falsifiability and refutation, principally for ontological purposes, and secondarily for epistemic purposes. Falsification conditions specify the meanings of laws and entities relative to the perspective of some potentially falsifying agent. While the agent may believe in unfalsifiable entities, falsification conditions will serve to precisely pin down that which can be precisely pinned down.

I have only seen "agential" used in the philosophical literature in the context of agential realism, a view I do not understand well enough to comment on. I was tempted to use "subjective"; however, while subjects have observations, they do not necessarily have the ability to take actions. Thus I believe "agential" has a more concordant denotation.

You'll note that my notion of "agent" already assumes one can take actions. Thus, a kind of free will is taken as metaphysically basic. This presupposition may cause problems later. However, I will try to show that, if careful attention is paid, the obvious problems (such as contradiction with determinism) can be avoided.

The perspective in this post can be seen as starting from agency, defining consequences in terms of agency, and defining physics in terms of consequences. In contrast, the most salient competing decision theory views (including framings of CDT, EDT, and FDT) define agency in terms of consequences ("expected utility maximization"), and consequences in terms of physics ("counterfactuals"). So I am rebasing the ontological stack, turning it upside-down. This is less absurd than it first appears, as will become clear.

(For simplicity, assume observations and actions are both symbols taken from some finite alphabet.)

Naive determinism

Let's first, within a critical agential ontology, disprove some very basic forms of determinism.

Let A be some action. Consider the statement: "I will take action A". An agent believing this statement may falsify it by taking any action B not equal to A. Therefore, this statement does not hold as a law. It may be falsified at will.

Let f() be some computable function returning an action. Consider the statement: "I will take action f()". An agent believing this statement may falsify it by taking an action B not equal to f(). Note that, since the agent is assumed to be able to compute things, f() may be determined. So, indeed, this statement does not hold as a law, either.

This contradicts a certain strong formulation of naive determinism: the idea that one's action is necessarily determined by some known, computable function.

Action-consequences

But wait, what about physics? To evaluate what physical determinism even means, we need to translate physics into a critical agential ontology. However, before we turn to physics, we will first consider action-consequences, which are easier to reason about.

Consider the statement: "If I take action A, I will immediately there-after observe O." This statement is falsifiable, which means that if it is false, there is some policy the agent can adopt that will falsify it. Specifically, the agent may adopt the policy of taking action A. If the agent will, in fact, not observe O after taking this action, then the agent will learn this, falsifying the statement. So the statement is falsifiable.

Finite conjunctions of falsifiable statements are themselves falsifiable. Therefore, the conjunction "If I take action A, I will immediately there-after observe O; if I take action B, I will immediately there-after observe P" is, likewise, falsifiable.

Thus, the agent may have falsifiable beliefs about observable consequences of actions. This is a possible starting point for decision theory: actions having consequences is already assumed in the ontology of VNM utility theory.

Falsification and causation

Now, the next step is to account for physics. Luckily, the falsificationist paradigm was designed around demarcating scientific hypotheses, such that it naturally describes physics.

Interestingly, falsificationism takes agency (in terms of observations, computation, and action) as more basic than physics. For a thing to be falsifiable, it must be able to be falsified by some agent, seeing some observation. And the word able implies freedom.

Let's start with some basic Popperian logic. Let f be some testable function (say, connected to a computer terminal) taking in a natural number and returning a Boolean. Consider the hypothesis: "For all x, f(x) is true". This statement is falsifiable: if it's false, then there exists some action-sequence an agent can take (typing x into the terminal, one digit at a time) that will prove it to be false.

The given hypothesis is a kind of scientific law. It specifies a regularity in the environment.

Note that there is a "bridge condition" at play here. That bridge condition is that the function f is, indeed, connected to the terminal, such that the agent's observations of f are trustworthy. In a sense, the bridge condition specifies what f is, from the agent's perspective; it allows the agent to locate f as opposed to some other function.

Let us now consider causal hypotheses. We already considered action-consequences. Now let us extend this analysis to reasoning about causation between external entities.

Consider the hypothesis: "If the match is struck, then it will alight immediately". This hypothesis is falsifiable by an agent who is able to strike the match. If the hypothesis is false, then the agent may refute it by choosing to strike the match and then seeing the result. However, an agent who is unable to strike the match cannot falsify it. (Of course, this assumes the agent may see whether the match is alight after striking it)

Thus, we are defining causality in terms of agency. The falsification conditions for a causal hypothesis refer to the agent's abilities. This seems somewhat wonky at first, but it is quite similar to Pearlian casuality, which defines causation in terms of metaphysically-real interventions. This order of definition radically reframes the determinism vs. free will apparent paradox, by defining the conditions of determinism (causality) in terms of potential action.

External physics

Let us now continue, proceeding to more universal physics. Consider the law of gravity, according to which a dropped object will accelerate downward at a near-constant weight. How might we port this law into an agential ontology?

Here is the assumption about how the agent interacts with gravity. The agent will choose some natural number as the height of an object. Thereafter, the object will fall, while a camera will record the height of the object at each natural-number time expressed in milliseconds, to the nearest natural-number millimeter from the ground. The agent may observe a printout of the camera data afterwards.

Logically, constant gravity implies, and is implied by, a particular quadratic formula for the height of the object as a function of the object's starting height and the amount of time that has passed. This formula implies the content of the printout, as a function of the chosen height. So, the agent may falsify constant gravity (in the observable domain) by choosing an object-height, placing an object at that height, letting it fall, and checking the printout, which will show the law of constant gravity to be false, if the law in fact does not hold for objects dropped at that height (to the observed level of precision).

Universal constant gravity is not similarly falsifiable by this agent, because this agent may only observe this given experimental setup. However, a domain-limited law, stating that the law of constant gravity holds for all possible object-heights in this setup, up to the camera's precision, is falsifiable.

It may seem that I am being incredibly pedantic about what a physical law is and what the falsification conditions are; however, I believe this level of pedantry is necessary for critically examining the notion of physical determinism to a high-enough level of rigor to check interaction with free will.

Internal physics

We have, so far, considered the case of an agent falsifying a physical law that applies to an external object. To check interaction with free will, we must interpret physical law applied to the agent's internals, on which the agent's cognition is, perhaps, running in a manner similar to software.

Let's consider the notion that the agent itself is "running on" some Turing machine. We will need to specify precisely what such "running on" means.

Let C be the computer that the agent is considering whether it is running on. C has, at each time, a tape-state, a Turing machine state, an input, and an output. The input is attached to a sensor (such as a camera), and the output is attached to an actuator (such as a motor).

For simplicity, let us say that the history of tapes, states, inputs, and outputs is saved, such that it can be queried at a later time.

We may consider the hypothesis that C, indeed, implements the correct dynamics for a given Turing machine specification. These dynamics imply a relation between future states and past states. An agent may falsify these dynamics by checking the history and seeing if the dynamics hold.

Note that, because some states or tapes may be unreachable, it is not possible to falsify the hypothesis that C implements correct dynamics starting from unreachable states. Rather, only behavior following from reachable states may be checked.

Now, let us think on an agent considering whether they "run on" this computer C. The agent may be assumed to be able to query the history of C, such that it may itself falsify the hypothesis that C implements Turing machine specification M, and other C-related hypotheses as well.

Now, we can already name some ways that "I run on C" may be falsified:

Perhaps there is a policy I may adopt, and a time t, such that if I implement this policy, I will observe O at time t, but C will observe something other than O at time t.
Perhaps there is a policy I may adopt, and a time t, such that if I implement this policy, I will take action A at time t, but C will take an action other than A at time t.

The agent may prove these falsification conditions by adopting a given policy until some time t, and then observing C's observation/action at time t, compared to their own observation/action.

I do not argue that the converse of these conditions exhaust what it means that "I run on C". However, they at least restrict the possibility space by a very large amount. For the falsification conditions given to not hold, the observations and behavior of C must be identical with the agent's own observations and behavior, for all possible policies the agent may adopt.

I will name the hypothesis with the above falsification conditions: "I effectively run on C". This conveys that these conditions may not be exhaustive, while still being quite specific, and relating to effects between the agent and the environment (observations and actions).

Note that the agent can hypothesize itself to effectively run on multiple computers! The conditions for effectively running on one computer do not contradict the conditions for effectively running on another computer. This naturally handles cases of identical physical instantiations of a single agent.

At this point, we have an account of an agent who:

Believes they have observations and take free actions
May falsifiably hypothesize physical law
May falsifiably hypothesize that some computer implements a Turing machine specification
May falsifiably hypothesize that they themselves effectively run on some computer

I have not yet shown that this account is consistent. There may be paradoxes. However, this at least represents the subject matter covered in a unified critical agential ontology.

Paradoxes sought and evaluated

Let us now seek out paradox. We showed before that the hypothesis "I take action f()" may be refuted at will, and therefore does not hold as a necessary law. We may suspect that "I effectively run on C" runs into similar problems.

Self-contradiction

Remember that, for the "I effectively run on C" hypothesis to be falsified, it must be falsified at some time, at which the agent's observation/action comes apart from C's. In the "I take action f()" case, we had the agent simulate f() in order to take an opposite action. However, C need not halt, so the agent cannot simulate C until halting. Instead, the agent may select some time t, and run C for t steps. But, by the time the agent has simulated C for t steps, the time is already past t, and so the agent may not contradict C's behavior at time t, by taking an opposite action. Rather, the agent only knows what C does at time t at some time later than t, and only their behavior after this time may depend on this knowledge.

So, this paradox is avoided by the fact that the agent cannot contradict its own action before knowing it, but cannot know it before taking it.

We may also try to create a paradox by assuming an external super-fast computer runs a copy of C in parallel, and feeds this copy's action on subjective time-step t into the original C's observation before time t; this way, the agent may observe its action before it takes it. However, now the agent's action is dependent on its observation, and so the external super-fast computer must decide which observation to feed into the parallel C. The external computer cannot know what C will do before producing this observation, and so this attempt at a paradox cannot stand without further elaboration.

We see, now, that if free will and determinism are compatible, it is due to limitations on the agent's knowledge. The agent, knowing it runs on C, cannot thereby determine what action it takes at time t, until a later time. And the initial attempt to provide this knowledge externally fails.

Downward causation

Let us now consider a general criticism of functionalist views, which is that of downward causation: if a mental entity (such as observation or action) causes a physical entity, doesn't that either mean that the mental entity is physical, or that physics is not causally closed?

Recall that we have defined causation in terms of the agent's action possibilities. It is straightforwardly the case, then, that the agent's action at time t causes changes in the environment.

But, what of the physical cause? Perhaps it is also the case that C's action at time t causes changes in the environment. If so, there is a redundancy, in that the change in the environment is caused both by the agent's action and by C's action. We will examine this possible redundancy to find potential conflicts.

To consider ways that C's action may change the environment, we must consider how the agent may intervene on C's action. Let us say we are concerned with C's action at time t. Then we may consider the agent at some time u < t taking an action that will cause C's action at time t to be over-written. For example, the agent may consider programming an external circuit that will interact with C's circuit ("its circuit").

However, if the agent performs this intervention, then the agent's action at time t has no influence on C's action at time t. This is because C's action is, necessarily, equal to the value chosen at time u. (Note that this lack of influence means that the agent does not effectively run on C, for the notion of "effectively run on" considered! However, the agent may be said to effectively run on C with one exception.)

So, there is no apparent way to set up a contradiction between these interventions. If the agent decides early (at time u) to determine C's action at time t, then that decision causes C's action at time t; if the agent does not do so, then the agent's decision at time t causes C's action at time t; and these are mutually exclusive. Hence, there is not an apparent problem with redundant causality.

Epiphenomenalism

It may be suspected that the agent I take to be real is epiphenomenal. Perhaps all may be explained in a physicalist ontology, with no need to posit that there exists an agent that has observations and takes actions. (This is a criticism levied at some views on consciousness; my notion of metaphysically-real observations is similar enough to consciousness that these criticisms are potentially applicable)

The question in regards to explanatory power is: what is being explained, in terms of what? My answer is: observations are being explained, in terms of hypotheses that may be falsified by action/observations.

An eliminativist perspective denies the agent's observations, and thus fails to explain what ought to be explained, in my view. However, eliminativists will typically believe that "scientific observation" is possible, and seek to explain scientific observations.

A relevant point to make here is that the notion of scientific observation assumes there is some scientific process happening that has observations. Indeed, the scientific method includes actions, such as testing, which rely on the scientific process taking actions. Thus, scientific processes may be considered as agents in the sense I am using the term.

My view is that erasing the agency of both individual scientists, and of scientific processes, puts the ontological and epistemic status of physics on shaky ground. It is hard to say why one should believe in physics, except in terms of it explaining observations, including experimental observations that require taking actions. And it is hard to say what it means for a physical hypothesis to be true, with no reference to how the hypothesis connects with observation and action.

In any case, the specter of epiphenomenalism presents no immediate paradox, and I believe that it does not succeed as a criticism.

Comparison to Gary Drescher's view

I will now compare my account to Gary Drescher's view. I have found Drescher's view to be both particularly systematic and compelling, and to be quite similar to the views of other relevant philosophers such as Daniel Dennett and Eliezer Yudkowsky. Therefore, I will compare and contrast my view with Drescher's. This will dispel the illusion that I am not saying anything new.

Notably, Drescher makes a similar observation to mine on Pearl: "Pearl's formalism models free will rather than mechanical choice."

Quoting section 5.3 of Good and Real:

Why did it take that action? In pursuit of what goal was the action selected? Was that goal achieved? Would the goal have been achieved if the machine had taken this other action instead? The system includes the assertion that if the agent were to do X, then Y would (probably) occur; is that assertion true? The system does not include the assertion that if it were to do P, Q would probably occur; is that omitted assertion true? Would the system have taken some other action just now if it had included that assertion? Would it then have better achieved its goals?

Insofar as such questions are meaningful and answerable, the agent makes choices in at least the sense that the correctness of its actions with respect to its designated goals is analyzable. That is to say, there can be means-end connections between its actions and its goals: its taking an action for the sake of a goal can make sense. And this is so despite the fact that everything that will happen-including every action taken and every goal achieved or not-is inalterably determined once the system starts up. Accordingly, I propose to call such an agent a choice machine.

Drescher is defining conditions of choice and agency in terms of whether the decisions "make sense" with respect to some goal, in terms of means-end connections. This is a "outside" view of agency in contrast with my "inside" view. That is, it says some thing is an agent when its actions connect with some goal, and when the internal logic of that thing takes into account this connection.

This is in contrast to my view, which takes agency to be metaphysically basic, and defines physical outside views (and indeed, physics itself) in terms of agency.

My view would disagree with Drescher's on the "inalterably determined" assertion. In an earlier chapter, Drescher describes a deterministic block-universe view. This view-from-nowhere implies that future states are determinable from past states. In contrast, the view I present here rejects views-from-nowhere, instead taking the view of some agent in the universe, from whose perspective the future course is not already determined (as already argued in examinations of paradox).

Note that these disagreements are principally about metaphysics and ontology, rather than scientific predictions. I am unlikely to predict the results of scientific experiments differently from Drescher on account of this view, but am likely to account for the scientific process, causation, choice, and so on in different language, and using a different base model.

Conclusion and further research

I believe the view I have presented to be superior to competing views on multiple fronts, most especially logical/philosophical systematic coherence. I do not make the full case for this in this post, but take the first step, of explicating the basic ontology and how it accounts for phenomena that are critically necessary to account for.

An obvious next step is to tackle decision theory. Both Bayesianism and VNM decision theory are quite concordant with critical agential ontology, in that they propose coherence conditions on agents, which can be taken as criticisms. Naturalistic decision theory involves reconciling choice with physics, and so a view that already includes both is a promising starting point.

Multi-agent systems are quite important as well. The view presented so far is near-solipsistic, in that there is a single agent who conceptualizes the world. It will need to be defined what it means for there to be "other" agents. Additionally, "aggregative" agents, such as organizations, are important to study, including in terms of what it means for a singular agent to participate in an aggregative agent. "Standardized" agents, such as hypothetical skeptical mathematicians or philosophers, are also worthy subjects of study; these standardized agents are relevant in reasoning about argumentation and common knowledge. Also, while the discussion so far has been in terms of closed individualism, alternative identity views such as empty individualism and open individualism are worth considering from a critical agential perspective.

Other areas of study include naturalized epistemology and philosophy of mathematics. The view so far is primarily ontological, secondarily epistemological. With the ontology in place, epistemology can be more readily explored.

I hope to explore the consequences of this metaphysics further, in multiple directions. Even if I ultimately abandon it, it will have been useful to develop a coherent view leading to an illuminating refutation.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

11

A critical agential account of free will, causation, and physics

11

Naive determinism

Action-consequences

Falsification and causation

External physics

Internal physics

Paradoxes sought and evaluated

Self-contradiction

Downward causation

Epiphenomenalism

Comparison to Gary Drescher's view

Conclusion and further research