I believe that it is possible for two agents to have the exact same source code while (in some sense) optimising two different utility functions.

I take a utility function to be a function from possible world states to real numbers. When I say that an agent is optimising a utility function I mean something like that the agent is "pushing" its environment towards states with higher values according to said utility function. This concept is not entirely unambiguous, but I don't think its necessary to try to make it more explicit here. By source code I mean the same thing as everyone else means by source code.

Now, consider an agent which has a goal like "gain resources" (in some intuitive sense). Say that two copies of this agent are placed in a shared environment. These agents will now push the environment towards different states, and are therefore (under the definition I gave above) optimising different utility functions.

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 10:52 AM

Makes sense. It seems to flow from the fact that the source code is in some sense allowed to use concepts like 'Me' or 'I', which refer to the agent itself. So both agents have source code which says "Maximise the resources that I have control over", but in Agent 1 this translates to the utility function "Maximise the resources that Agent 1 has control over", and in Agent 2 this translates to the different utility function "Maximise the resources that Agent 2 has control over".

So this source code thing that we're tempted to call a 'utility function' isn't actually valid as a mapping from world states to real numbers until the agent is specified, because these 'Me'/'I' terms are undefined.

I agree.

An even simpler example: If the agents are reward learners, both of them will optimize for their own reward signal, which are two different things in the physical world.