Nate Showell — AI Alignment Forum

It seems like fixed points could be used to replace the concept of utility, or at least to ground it as an inferred property of more fundamental features of the agent-environment system. The concept of utility is motivated by the observation that agents have preference orderings over different states. Those preference orderings are statements about the relative stability of different states, in terms of the direction in which an agent tends to transition between them. It seems duplicative to have both utilities and fixed points as two separate descriptions of state transition processes in the agent-environment system; utilities look like they could be defined in terms of fixed points.

As one preliminary idea for how to do this, you could construct a fully connected graph in which the vertices are the probability distributions $p$ that satisfy $b (p) = p$ . The edges $E$ are beliefs that represent hypothetical transitions between the fixed points. The graph $G$ would take the place of a preference ordering by describing the tendency of the agent to move between the fixed points if given the option. (You could also model incomplete preferences by not making the graph fully connected.) Performing power iteration with the transition matrix of $G$ would act as a counterpart to moving through the preference ordering.

Further exploration of this unification of utilities and fixed points could involve connecting $G$ to the beliefs that are actually, rather than just counterfactually, present in the agent-environment system, to describe what parts of the system the agent can control. Having a way to represent that connection could let us rewrite the instrumental constraint to not rely on $U$ .

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments