Gradations of Agency

Daniel Kokotajlo

[Epistemic status: This post written hastily, for Blog Post Day.]

I’ve de-prioritised this sequence due to other recent posts which cover a lot of the same ground. Moreover I begin to suspect that the theory of agency I’m building is merely the glowing brain and that Yudkowsky-Veedrac's is the cosmic galaxy brain. Perhaps mine can be a useful stepping-stone.

Consider the messy and large real world, with creatures reproducing, competing, and evolving. The behavior of each entity is controlled by some sort of cognition/computation—some sort of algorithm.

Inspired by something Dan Dennett wrote (h/t Ramana Kumar), I propose the following loose hierarchy of algorithm families. As we move up the hierarchy, things get more complicated and computationally expensive, but also more powerful and general.

Level 1: Do what worked in the past for your ancestors. You have some “input channel” or “senses” and then you respond to it somehow, doing different things in response to different inputs.

Example: You swim towards warmth and away from cold and hot. You run from red things and towards green things.

This is more complex and computationally expensive, but (if done right) more powerful and general than algorithms which aren’t environment-responsive: There are many situations, many niches in the environment, where being able to sense and adapt quickly helps you survive and thrive.

Level 2: Do what worked in the past for you. How you respond to inputs is itself responsive to inputs; you learn over the course of your life. This typically involves some sort of memory and a special input channel for positive and negative reward.

Example: You start off reacting randomly to inputs, but you learn to run from red things and towards green things because when you ran towards red things you got negative reward and when you ran towards green things you got positive reward.

This is more complex and computationally expensive, but (if done right) more powerful and general:

For a particular ecological niche, you & your relatives will reproduce and fill the niche more quickly than rivals from Level 1. This is because your rivals in some sense keep doing the same thing until they die, and will thus only “learn” to exploit the niche by gradual natural selection.
Also, there are some niches that Level 1 creatures simply cannot survive in, that you can. Example: In Pond X, the color of food and the color of predators constantly changes on timescales of about a lifetime, too fast for a population of Level 1 creatures to adapt.

Level 3: Do what worked in the past for things similar to you in similar situations. You have some ability to judge similarity, and in particular to notice and remember when your current situation is similar to the situation of something else you saw, something you classify as similar to you. Armed with this ability you can learn not just from your own experience, but from the experience of others—you can identify successful others and imitate them.

This is more complex and computationally expensive, but (if done right) more powerful and general:

For a particular ecological niche, you & your relatives will reproduce and fill the niche more quickly than rivals from level 2. This is because you can learn from the experiences of others while level 2 creatures have to experience things themselves.
Also, there are some niches that level 2 creatures simply cannot survive in, that you can. For example, niches in which certain behaviors in certain situations are deadly.

Level 4: Do what worked in your head when you imagined various plans. Whereas with Level 3 you processed your memories into the sort of world-model that allowed you to ask “Have I seen anyone similar to me in a similar situation before? What did they do—did it work for them?” now you are processing your memories into the sort of gearsy world-model that allows you to imagine the consequences of different hypothetical actions and pick the action has the best imagined consequences. This includes level 3 thinking as a special case.

This is more complex and computationally expensive than level 3, but (if done right) more powerful and general. (To save space, spelling out why is left as an exercise for the reader.)

Level 5: Cognitive algorithms learned from experience. Remember, the levels we’ve been describing so far are algorithm families. There are lots of choices to be made within each family about exactly how to do it—e.g. How many possible actions do you consider, and how do you choose which ones? How long do you spend imagining the outcomes of each possible action? How should the answers to the previous questions depend on your situation? Since I didn’t say otherwise, assume that in previous levels the answers to these questions are fixed and hard-coded; now, at Level 5, we imagine that they are variable parameters you can adjust depending on the situation and which you can learn over the course of your life.

Remember that realistically you won’t be just doing one level of cognition all the time; that’s massively computationally wasteful. Instead you’ll probably be doing some sort of meta-algorithm that switches between levels as needed.

(For example, there’s no need to imagine the consequences of all my available actions and pick the best plan if I’m doing something extremely similar to what I did in the past, like brush my teeth; I can handle those situations “on autopilot.” So maybe the first few times I brush my teeth I do it the computationally expensive way, but I quickly learn simple algorithms to “automate” the process and then thenceforth when I notice I’m in the teeth-brushing situation I switch on those algorithms and let the imitation and planning parts of my brain relax or think about other things.)

Given that this is what you’ll be doing, whether to delegate to some cheaper/faster process, and which one to delegate to, is itself a difficult question that is best answered with a learned, rather than hard-coded, algorithm. Hence the importance of Level 5.

Level 6: Cognitive algorithms learned from imitation and imagined experience. Whereas in Level 5 the meta-cognitive process that chose details of your algorithm was simple: “think in ways that seem to have worked in the past;” now you are able to apply your imitation and planning algorithms on this meta level: “think in ways that worked for others” and “think in ways that you predict will work better than the alternatives.”

…I’ll stop there. I’m not claiming that there are no levels above 6; nor am I claiming that this way of carving up the space of algorithms is the best way. Here are my conjectures:

We can imagine a graph with algorithmic complexity + computational cost on the x-axis, and power level (I haven’t defined this, but note that each of the levels both learns more quickly and can succeed in a wider variety of situations compared to the previous levels) on the y-axis. The different algorithm families can be depicted on the graph like so. (Each instance of a family lies somewhere below that family’s curve.) A process that optimizes for power+generality while searching deeper and deeper into the y-axis (such as a series of neural net training runs with larger and larger neural nets, or a single training run perhaps) will tend to progress up some sort of hierarchy similar to the six-level hierarchy I’ve described.
Intuitively, each level seems “more agenty” than the previous levels. Perhaps in the infinite limit, you get something like a Bayesian expected utility maximizer for which utility = power-and-generality-as-defined-by-the-graph. Could it instead be that if we keep exploring the diagram farther to the right, the trend towards agency will reverse? Maybe, but I would be quite surprised.
Each level seems to be capable of creating and sustaining more powerful, more robust P₂B loops than the previous levels. (Arguably levels 1, 2, 3 aren’t really P₂B loops because they don’t involve planning… but I don’t want to say that exactly because e.g. level 3 involves imitating others and so that probably ends up accumulating resources and knowledge and other convergent instrumental resources, and generally behaving very similarly to a planner at least in some environments. And likewise level 2 can sorta crappily imitate level 3… perhaps I should say that the key thing is “convergent instrumental resource feedback loops” and that thinking about P₂B is a way to see why such things happen and why the sufficiently big ones tend to have planners sitting in the middle directing them, and the really really big ones tend to have level 6+ algorithms sitting in the middle directing them…

The next and probably last post in this sequence ties it all together & answers the big questions about agency.

[-]Victoria Krakovna3y30

This is an interesting hierarchy! I'm wondering how to classify humans and various current ML systems along this spectrum. My quick take is that most humans are at Levels 4-5, AlphaZero is at level 5, and GPT-3 is at level 4 with the right prompting. Curious if you have specific ML examples in mind for these levels.

[-]Daniel Kokotajlo3y20

Thanks! Hmm, I would have thought humans were at Level 6, though of course most of their cognition most of the time is at lower levels.

I think some humans are at level 6 some of the time (see Humans Who Are Not Concentrating Are Not General Intelligences). I would expect that learning cognitive algorithms from imagined experience is pretty hard for many humans (e.g. examples in the Astral Codex post about conditional hypotheticals). But maybe I have a different interpretation of Level 6 than what you had in mind?

[-]Daniel Kokotajlo3y30

Good point re learning cognitive algorithms from imagined experience, that does seems pretty hard. From imitation though? We do it all the time. Here's an example of me doing both:

I read books about decision theory & ethics, and learn about expected utility maximization & the bounded variants that humans can actually do in practice (back of envelope calculations, etc.) I immediately start implementing this algorithm myself on a few occasions. (Imitation)

Then I read more books and learn about "pascal's mugging" and the like. People are arguing about whether or not it's a problem for expected utility maximization. I think through the arguments myself and come up with some new arguments of my own. This involves imagining how the expected utility maximization algorithm would behave in various hypothetical scenarios, and also just reasoning analytically about the properties of the algorithm. I end up concluding that I should continue using the algorithm but with some modifications. (Learning from imagined experience.)

Would you agree with this example, or are you thinking about the hierarchy somewhat differently than me? I'm keen to hear more if the latter.

Ah, I think you intended level 6 as an OR of learning from imitation / imagined experience, while I interpreted it as an AND. I agree that humans learn from imitation on a regular basis (e.g. at school). In my version of the hierarchy, learning from imitation and imagined experience would be different levels (e.g. level 6 and 7) because the latter seems a lot harder. In your decision theory example, I think a lot more people would be able to do the imitation part than the imagined experience part.

Well said; I agree it should be split up like that.

AI ALIGNMENT FORUM
AF

23

Gradations of Agency

23