Towards a Mechanistic Understanding of Goal-Directedness

Nice post! Surprisingly, I'm interested in the topic. ^^

Funny too that you focus on an idea I am writing a post about (albeit from a different angle). I think I broadly agree with your conjectures, for sufficient competence and generalization at least.

Most discussion about goal-directed behavior has focused on a behavioral understanding, which can roughly be described as using the intentional stance to predict behavior.

I'm not sure I agree with that. Our lit review shows that there are both behavioral and mechanistic approaches (Richard's goal-directed agency is an example of the latter)

A machine “is an NFA (mechanistically)” if the internal mechanism has non-deterministic transitions.

The analogy is great, but if I nitpick a little, I'm not sure a non-determinism mechanism makes sense. You have either deterministm or probabilities, but I don't see how to implement determinism. That's by the way a reason why the non-deterministic Turing Machines aren't really used anymore when talking about complexity classes like NP.

Adam Shimi’s Literature Review on Goal-Directedness identifies five properties behaviorally goal-directed systems have

Two corrections here: the post was written with Michele Campolo and Joe Collman, so they should also be given credit; and we identify five properties that the literature on the subject focuses and agrees on. We don't necessarily say that both are necessary or as important.

We restructure these properties hierarchically:

I would like more explanations here, because I'm not sure that I follow. Specifically, I can't make sense of "what is the distribution over goals?". Are you talking about the prior over goals in some sort of bayesian goal-inference?

Roughly speaking, an agent is mechanistically goal-directed if we can separate it into a goal that is being pursued and an optimization process doing that pursuit.

I like this. My current position (that will be written down in my next post on the subject) is that these mechanical goal directed systems are actually behavioral goal-directed systems at a certain level of competence. They also represent a point where "simple models" become more predictive than the intentional stance, because the optimization itself can be explained a simple model.

Efficient: The more mechanistically goal-directed a system is, the more efficiently it pursues its goal.

Shouldn't that be the other way around?

We omit “far-sighted” because this is not a property intrinsically related to goal-directedness. We view far-sighted goal-directed agents as more dangerous than near-sighted ones, but not less goal-directed. While there might be a large difference between far-sighted and near-sighted agents, the mechanistic difference is as small as a single discount parameter.

It's funny, because I actually see far-sightedness as a property of the internal structure more than the behavior. So I would assume that a mechanically goal-directed system shows some far-sightedness.

However, many possible internal mechanisms can result in the same behavior, so this connection is lossy. For example, a maze-solver can either be employing a good set of heuristics or implementing depth-first search.

But those two maze-solver won't actually have the same behavior. I think the lossy connection doesn't come from the fact that multiple internal mechanisms can result in the same behavior "over all situations" (because in that case the internal differences are irrelevant) but in the fact that they can result in the same behavior for the training/testing environments considered.

Algorithms can be behaviorally linear time-complexity if they tend to take time that scales linearly with the input length and mechanistically linear time-complexity if they’re provably in O(n).

I disagree with that example. What you call behavioral time complexity is more something like averaged time complexity (or smooth analysis maybe). And in complexity theory, the only thing that exists is behavioral.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

23

Towards a Mechanistic Understanding of Goal-Directedness

23

Introduction

Behavioral Goal-Directedness

Mechanistic Goal-Directedness

Correspondence Conjectures

Conclusion