There is a paper which I believe is trying to do something similar to what you are attempting here:
Are you aware of it? How do you think their ideas relate to yours?
Very interesting, thank you for the link!
Main difference between what they're doing and what I'm doing: they're using explicit utility & maximization nodes; I'm not. It may be that this doesn't actually matter. The representation I'm using certainly allows for utility maximization - a node downstream of a cloud can just be a maximizer for some utility on the nodes of the cloud-model. The converse question is less obvious: can any node downstream of a cloud be represented by a utility maximizer (with a very artificial "utility")? I'll probably play around with that a bit; if it works, I'd be able to re-use the equivalence results in that paper. If it doesn't work, then that would demonstrate a clear qualitative difference between "goal-directed" behavior and arbitrary behavior in these sorts of systems, which would in turn be useful for alignment - it would show a broad class of problems where utility functions do constrain.
Glad you liked it.
Another thing you might find useful is Dennett's discussion of what an agent is (see first few chapters of Bacteria to Bach). Basically, he argues that an agent is something we ascribe beliefs and goals to. If he's right, then an agent should basically always have a utility function.
Your post focuses on the belief part, which is perhaps the more interesting aspect when thinking about strange loops and similar.
Why aren't you notationally distinguishing between "actual model" versus "what the agent believes the model to be"? Or are you and I missed it?
On reflection, there's a better answer to this than I originally gave, so I'm trying again.
"What the agent believes the model to be" is whatever's inside the cloud in the high-level model. That's precisely what the clouds mean. But the clouds (and their contents) only exist in the high-level model; the low-level model contains no clouds. The "actual model" is the low-level model.
So, when we talk about the extent to which the high-level and low-level models match - i.e. what queries on the low-level model can be answered by queries on the high-level model - we're implicitly talking about the extent to which the agent's model matches the low-level model.
The high-level model (at least the part of it within the cloud) is "what the agent believes the model to be".
EDIT: This answer isn't very good, see my other one.
Good question. We could easily draw a diagram in which the two are separate - we'd have the "agent" node reading from one cloud and then influencing things outside of that cloud. But that case isn't very interesting - most of what we call "agenty" behavior, and especially the diagonalization issues, are about the case where the actual model and the agent's beliefs coincide. In particular, if we're talking about ideal game-theoretic agents, we usually assume that both the rules of the game and each agent's strategy are common knowledge - including off-equilibrium behavior.
So, for idealized game-theoretic agents, there is no separation between the actual model and the agent's model - interventions on the actual model are reflected in the agent's model.
That said, in the low-level model, the map and the territory will presumably always be separate. "When do they coincide?" is implicitly wrapped up in the question "when do non-agenty models abstract into agenty models?". I view the potential mismatch between the two models as an abstraction failure - if they don't match, then the agency-abstraction is broken.
Agenty things have the type signature (A -> B) -> A. In English: agenty things have some model (A -> B) which predicts the results (B) of their own actions (A). They use that model to decide what actions to perform: (A -> B) -> A.
In the context of causal DAGs, the model (A -> B) would itself be a causal DAG model M - i.e. some Python code defining the DAG. Logically, we can represent it as:
M=“(P[A|M]=fA(A))&(P[B|A,M]=fB(B,A))”
… for some given distribution functions fA and fB.
From an outside view, the model (A -> B) causes the choice of action A. Diagrammatically, that looks something like this:
The “cloud” in this diagram has a precise meaning: it’s the model M for the DAG inside the cloud.
Note that this model does not contain any true loops - there is no loop of arrows. There’s just the Hofstaderian “strange loop”, in which node A depends on the model of later nodes, rather than on the later nodes themselves.
How would we explicitly write this model as a Bayes net?
The usual way of writing a Bayes net is something like:
P[X]=∏iP[Xi|Xpa(i)]
… but as discussed in the previous post, there’s really an implicit model M in there. Writing everything out in full, a typical Bayes net would be:
P[X|M]=∏iP[Xi|Xpa(i),M]
… with M=“∀i:P[Xi|Xpa(i),M]=fi(Xi,Xpa(i))”.
Now for the interesting part: what happens if one of the nodes is agenty, i.e. it performs some computation directly on the model? Well, calling the agenty node A, that would just be a term P[A|M]... which looks exactly like a plain old root node. The model M is implicitly an input to all nodes anyway, since it determines what computation each node performs. But surely our strange loop is not the same as the simple model A -> B? What are we missing? How does the agenty node use M differently from other nodes?
What predictions would (A -> B) -> A make which differ from A -> B?
Answer: interventions/counterfactuals.
Modifying M
If A is determined by a computation on the model M, then M is causally upstream of A. That means that, if we change M - e.g. by an intervention M←do(B=2,M) - then A should change accordingly.
Let’s look at a concrete example.
We’ll stick with our (A -> B) -> A system. Let’s say that A is an investment - our agent can invest anywhere from $0 to $1. B is the payout of the investment (which of course depends on the investment amount). The “inner” model M=“P[B|A,M]=fB(B,A)” describes how B depends on A.
We want to compare two different models within this setup:
What predictions would the two make differently?
Well, the main difference is what happens if we change the model M, e.g. by intervening on B. If we intervene on B - i.e. fix the payout at some particular value - then the “plain old root node” model predicts that investment A will stay the same. But the strange loop model predicts that A will change - after all, the payout no longer depends on the investment, so our agent can just choose not to invest at all and still get the same payout.
In game-theoretic terms: agenty models and non-agenty models differ only in predictions about off-equilibrium (a.k.a. interventional/counterfactual) behavior.
Practically speaking, the cleanest way to represent this is not as a Bayes net, but as a set of structural equations. Then we’d have:
M=“P[Ui=u|M]=I[0≤u<1]duA=fA(M,UA)B=fB(A,UB)”
However, this makes the key point a bit tougher to see: the main feature which makes the system “agenty” is that M appears explicitly as an argument to a function, not just as prior information in probability expressions.