Agency and the unreliable autonomous car

Alex Flint

Financial status: This is independent research, now supported by a grant. I welcome financial support.

Epistemic status: I believe ~85% that the technical argument presented in this piece is correct.

Outline

This is my attempt to explain the basic Löb situation with the 5-and-10 problem.
This post considers an autonomous car choosing between a long route and a slow route, with the goal of minimizing the time to its destination.
If the autonomous car makes its decisions using a certain seemingly-reasonable algorithm based on proof search, this post shows that it may nonsensically take the slow route.
This is the 5-and-10 problem phrased in terms of autonomous driving.

Parable of the unreliable autonomous car

A long time ago, in a far away place, there was an autonomous car.

This particular autonomous car was programmed to use the agent model in its reasoning. That is, it was programmed to model the world as consisting of an agent and an environment, the agent being itself, and the environment being a network of roads that it would navigate each day. This is how the autonomous car was programmed to model the world.

Beyond using the agent model, this autonomous car had been given, by its designers, an accurate model of its own behavior. It could tell you how its own decision algorithm would behave under any set of circumstances. It could tell you this without being faced with those actual circumstances because it had a model of itself, and it could answer questions based on that model.

One day, a passenger got into the autonomous car and gave it a destination. There were two ways for the autonomous car to get there: a fast route, which would get it there at 1pm, and a slow route, which would get it there at 2pm. The goal of the autonomous car was to get to its destination as quickly as possible. This parable is about the reasoning followed by the car in deciding which of these two routes to take.

Now in this far-away place, autonomous cars were designed by a strange cadre of alien engineers, and thus they were programmed with strange algorithms. This particular autonomous car was programmed to decide among routes using the following procedure:

Make a list of the possible routes
Make a list of the possible arrival times
Make a list of all possible logical sentences of the form IF route 1 is taken THEN I will arrive at such-and-such a time AND IF route 2 is taken THEN I will arrive at such-and-such a time AND ...
Shuffle this list of logical sentences into a random order
For each logical sentence in the list from first to last:

5a. Search to a proof of this sentence, using your model of the world as a set of starting assumptions, up to a maximum proof length of one million characters.

5b. If a proof is found then output the route associated with the earliest arrival time

Here are a few extra details about this algorithm:

The list of possible routes in step 1 is returned by a subroutine that uses the car’s inbuilt map of the road network. On this particular day it returned two routes: SLOW and FAST.
The list of possible arrival times used on this particular day were 1pm, 2pm, and 3pm.
Since there is always a finite number of possible routes and arrival times, there is also always a finite number of logical sentences in step 3.
Step 5a only considers proofs that can be expressed in under one million characters because otherwise this step might run forever.
If a proof is found in step 5a then the loop over sentences terminates at step 5b.
In the first version of this post, step 5a was written as "Search for a proof that this sentence is consistent with your model of the world, up to a maximum proof length of one million characters." Thank you to JBlack for helping to correct this in the comments.

Now from our perspective here on Earth, we might think that this is a strange algorithm for an autonomous car to use. We might ask why the autonomous car didn’t just consider the routes one by one and compute an arrival time for each one. We might ask what business an autonomous car has conducting a search over logical proofs while a passenger is waiting patiently in the back seat. These are reasonable questions from our perspective here on Earth. But this parable is not about why this autonomous car was programmed in this way, it is about the tragic consequences that followed from it being programmed in this way.

And so the autonomous car began using its procedure to decide which route to take. It got to step 3 and the sentences in its list were:

IF route FAST is taken THEN I will arrive at 1pm AND IF route SLOW is taken THEN I will arrive at 1pm
IF route FAST is taken THEN I will arrive at 1pm AND IF route SLOW is taken THEN I will arrive at 2pm
IF route FAST is taken THEN I will arrive at 1pm AND IF route SLOW is taken THEN I will arrive at 3pm
IF route FAST is taken THEN I will arrive at 2pm AND IF route SLOW is taken THEN I will arrive at 1pm
IF route FAST is taken THEN I will arrive at 2pm AND IF route SLOW is taken THEN I will arrive at 2pm
IF route FAST is taken THEN I will arrive at 2pm AND IF route SLOW is taken THEN I will arrive at 3pm
IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 1pm
IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 2pm
IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 3pm

Then in step 4 it shuffled this list of sentences, and just by chance the very first sentence that it considered in step 5 was

IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 2pm

Now you and I, dear reader, can see that this sentence is false. In fact the fast route would get the autonomous car to its destination at 1pm, not 3pm. But the autonomous car was merely executing its programmed instructions. And at this time its programmed instructions told it, in step 5a, to look for a proof that the sentence was consistent with its model of the world. And its model of the world was the agent model, and within that model it had an understanding of the the road network (the environment), as well as an understanding of the its own decision-making procedure (the agent; itself), and so it could ask whether it was in fact the case that

IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 2pm

And so in step 5a it looked for a proof that this sentence was consistent with its model, and here is how that went.

The autonomous car saw that its own decision-making algorithm would consider sentences and determine, for each one, whether it was consistent with its internal model, by searching for a proof. It saw that the first such sentence to be considered would be

IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 2pm

It saw that the route chosen by its decision-making algorithm would be determined by whether this sentence was determined by the algorithm to be consistent with its model or not. It did not know whether this particular sentence was consistent with its model, so it decided to proceed by using proof-by-implication, which is where you assume some sentence p, then derive some sentence q following logically from it, then you get "IF p THEN q". (This is also a common proof method here on Earth.)

The p that the algorithm decided to assume for proof-by-implication was the following:

A proof exists for "IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 2pm"

Now, to understand what happened next, it is important to understand that this is not the same as assuming the sentence

IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 2pm

The autonomous car saw that the behavior of its decision algorithm rested not on whether taking the FAST route would in fact get it there by 3pm, nor whether taking the SLOW route would in fact get it there at 2pm, but on whether a proof existed of the sentence in question.

So the autonomous car proceeded in its reasoning, assuming for the sake of proof-by-implication that

a proof exists for "IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 2pm"

It saw that, given this assumption, its decision algorithm would, at step 5a, find the proof. It saw that its decision algorithm would, upon finding this proof, stop considering further sentences in the list and take the route associated with the earliest arrival time in the sentence that had been proved. It saw that the route associated with the earliest arrival time in the sentence that had been proved was the SLOW route. It then looked into its model of the road network and saw that, upon taking the SLOW route, it would arrive at its destination at 2pm.

Now you might ask why, if the autonomous car had a sufficiently detailed model of the road network, it didn’t just use that model to compare the two routes at the outset and skip over the search over proofs. Wouldn’t it be simpler just to compare the two routes and pick the one with the shortest arrival time? Well, yes, that would seem simpler to us here on Earth. But that is just not how this autonomous car was programmed.

This was all within the scope of the proof-by-implication, so at this time the autonomous car had in fact proved

IF
        a proof exists for "IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 2pm"
   THEN
        route SLOW is taken
        AND I will arrive at 2pm

Now you might think, dear reader, that the manner of reasoning used up to this point by this autonomous car was a little strange. I am afraid, dear reader, that at this point its reasoning became stranger still, although never stepping outside the very same rules of logic that we use here on Earth.

Still searching for a proof in step 5a, the autonomous car made the following logical deductions.

First, from

I will arrive at 2pm

it deduced

IF route SLOW is taken THEN I will arrive at 2pm

This is permitted by the laws of logic because whenever one has "q", one can deduce "IF p THEN q", since the "IF...THEN" is material implication and is logically equivalent to "NOT p OR q".

Second, from

route SLOW is taken

it deduced

NOT (route FAST is taken)

And then from this it deduced

IF route FAST is taken THEN I will arrive at 3pm

This, I’m afraid dear reader, is also permitted by the laws of logic. Once again, "IF p THEN q" is equivalent to "NOT p OR q", and the autonomous car already derived "NOT (route FAST is taken)", so it was perfectly justified in deriving, within the scope of the proof-by-implication in which it had established that route FAST would not be taken, that

IF route FAST is taken THEN I will arrive at 3pm

And so, as this tragedy nears its final movements, the autonomous car had derived, using proof-by-implication, the following:

IF
        a proof exists for "IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 2pm"
   THEN
        IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 2pm

And here, dear reader, you will see the shape of the demise of our dear autonomous car already outlined in this tragic tale. For Löb’s theorem, with which I am sure you are well acquainted, tells us that whenever we have a proof system that can prove a sentence of the form

IF a proof exists for p THEN p

then that very same formal system will, unwittingly, and without the slightest possibility of mistake, eventually also prove the sentence

Yes, tragic indeed, dear reader, is it not? Shall we continue with this tale, or is its conclusion already clear?

The autonomous car had proved

IF
        a proof exists for "IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 2pm"
   THEN
        IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 2pm

Being of no further use, it discarded this sentence and moved on with its proof search. It is not known for how long the passenger sat in the back seat, or by what strange motions of logic the proof search proceeded from this point. Perhaps many aeons passed, or perhaps it was merely seconds. We do not know. All that is known is that the autonomous car eventually proved

IF route FAST is taken THEN I will arrive at 3pm AND IF route SLOW is taken THEN I will arrive at 2pm

We know that it proved this because Löb’s theorem tells us that it must. We have seen that the logical apparatus of this autonomous car was capable of proving "IF a proof exists for p THEN p", and Löb’s theorem tells us that any logical apparatus capable of proving "IF a proof exists for p THEN p" is also capable of proving "p".

Therefore a proof was found, and the autonomous car proceeded to step 5b, and selected the route with the earliest arrival time, and that route was the SLOW route.

Discussion

How oh how did things go so tragically wrong for our autonomous car?

The first thing we must see, dear reader, is that the strange engineers in this far away place made certain choices in designing this autonomous car. These choices were design choices. They were not necessitated by the laws of logic — nobody would suggest that they were. The laws of logic do not force us to design autonomous cars with sophisticated self-models that use proof search to decide among routes to a destination.

So the most direct conclusion to draw from this sad fable, dear reader, is not to design autonomous cars in this particular way here on Earth, lest we ourselves suffer the fate of the unfortunate passenger from this tale. Making decisions based on proof search might sound like an adventure, dear reader, but as this tale shows, it is not for the faint of heart, nor for the punctual.

There are, of course, any number of alternative design choices one might make for an autonomous car. One might, for example, estimate an arrival time for each route based on distance and traffic conditions, and select the route for which the estimated arrival time is earliest. Why did the alien designers of our cherished autonomous car not give it a decision algorithm such as this?

Well it is not for us to understand the minds of such alien creatures. And yet we might, for the sake of satisfying our own curiosity, speculate, while knowing, of course, that the truth will be far beyond any conjecture that we could possibly invent.

Perhaps it was the intention of these extraordinary designers to incorporate not just knowledge of the environment, but also knowledge of the agent, which in this case is the autonomous car itself, into the estimated arrival time for each route. It is after all the case that the programming of the car affects its arrival time just as much as the environment affects its arrival time. Were the programming of the car such that left turns would be taken very quickly while right turns would be taken very slowly, it would be to the benefit of our esteemed and punctual passenger to pick a route with as few right turns as possible. And although one could hard-code a heuristic that minimizes the number of right turns into the decision algorithm, these alien engineers hardly seem like the type to make such brutish engineering choices. A far more general approach, and perhaps — who can say? — the intention of these remarkable alien engineers would be for the autonomous car to examine its own programming and determine for itself which factors would contribute to the speedy navigation of a route. This is certainly an ambition that we can sympathize with.

We can see, then, that by examining its own programming, our cherished autonomous car "steps out of itself" and looks back at itself as though watching a child from a distance. It is only natural, then, for us, in making our own choices about the design of our own autonomous systems, to similarly "step out" and look at the systems we are building from a certain distance. When we do this, we see that the peculiar design choices made for this peculiar autonomous car are faulty. There is no need to "put ourselves in the shoes" of the autonomous car and wonder why this or that conclusion was reached. We can simply see that the system fails to behave in the manner that we wish it to behave.

And we might speculate that it has something to do with the way the agent model was used.

But that topic, dear reader, will have to wait for another day.

[-]JBlack4y*10

This looks full of straightforward errors.

Firstly, the agent's logical deduction system is unsound. It includes something comparable with Peano arithmetic (or else Löb's theorem can't be applied), and then adds a deduction rule "if P can be proved consistent with the system-so-far then assume P is true". But we already know that for any consistent extension T of Peano arithmetic there is at least one proposition G for which both G and ~G are consistent with T. So both of these are deducible using the rule. Now, the agent might not find the contradiction, because...

This system can only find proofs up to a length of a million characters. The requirements of Löb's theorem include a formal system that is an extension of PA and is closed under deduction: that is, derives proofs of unbounded length. So we can't apply Löb's theorem to this system anyway.

[-]Alex Flint4y00

Are you referring to step 5a/5b of the algorithm, that says search for a proof and then act based on what is found? I wouldn't exactly characterize this as adding a further deduction rule. But perhaps you are referring to something?

It's true this gets a bit more finicky in the resource-bounded case. Critch's paper on resource-bounded Lob goes into this. I haven't looked into this very much.

Yes, I'm referring to 5a/5b in conjunction with self-reflection as a deduction rule, since the agent is described as being able to use these to derive new propositions.

There is also a serious problem with 5a itself: the agent needs to try to prove that some new proposition P is "consistent with its model of the world". That is, for its existing axioms and derivation rules T, prove that T+P does not derive a contradiction.

If T is consistent, then this is impossible by Gödel's second incompleteness theorem. Hence for any P, Step 5a will always exhaust all possible proofs of up to whatever bound and return without finding such a proof. Otherwise T is inconsistent and it may be able to find such a proof, but it is also obvious that its proofs can't be trusted and not at all surprising that it will take the wrong route.

[-]Alex Flint4y20

I think that I should modify 5a from "Search for a proof that this sentence is consistent with your model of the world, up to a maximum proof length of one million characters" to "Search to a proof of this sentence, using your model of the world as a set of starting assumptions". This is indeed a significant change to the algorithm and I thank you for pointing it out. I think that would resolve your second concern, about the problem with 5a itself, yes?

I think it might also resolve your first concern, about the unsoundness of the logical system, because the agent does not have a deduction rule that says that if P can be proved then P is true, but rather reasons from the fact that P can be proved, and the fact that this particular agent chooses its actions based on proof search and will behave in such-and-such a way if it finds a proof of this particular P, to conclude P. This only work for certain particular sentences P, such as the one that is constructed in this story, since this particular sentence P is one that, if the agent finds a proof of it, will cause it to take actions that lead to P itself being true.

AI ALIGNMENT FORUM
AF

13

Agency and the unreliable autonomous car

13

Outline

Parable of the unreliable autonomous car

Discussion