Daniel Herrmann is a decision theorist, formal epistemologist, and philosopher of AI. He is currently a postdoc at the University of Groningen. Daniel was a PIBBSS fellow.
He runs the not-quite-active philosophy blog The Crow's Nest, where he writes for both a philosophical and a broad audience.
Sapere Aude
Knowledge enormous makes a God of me.
Names, deeds, grey legends, dire events, rebellions,
Majesties, sovran voices, agonies,
Creations and destroyings, all at once
Pour into the wide hollows of my brain,
And deify me, as if some blithe wine
Or bright elixir peerless I had drunk,
And so become immortal. . .
Keats, Hyperion
Thanks for this. We agree it’s natural to think that a stronger optimizer means less information from seeing the end state, but the question shows up again here. The general tension is that one version of thinking of optimization is something like, the optimizer has a high probability of hitting a narrow target. But the narrowness notion is often what is doing the work in making this seem intuitive, and under seemingly relevant notions of narrowness (how likely is this set of outcomes to be realized), then the set of outcomes we wanted to say is narrow is, in fact, not narrow at all. The lesson we take is that a lot of the ways we want to measure the underlying space rely on choices we make in describing the (size of the) space. If the choices reflect our uncertainty, then we get the puzzle we describe. I don't see how moving to thinking in terms of entropy would address this. Given that we are working in continuous spaces, I think one way to see that we often makes choices like this, even with entropy, is to look at continuous generalizations of entropy. When we move to the continuous case, things become more subtle. Differential entropy (the most natural generalization) lacks some of the important properties that makes entropy a useful measure of uncertainty (it can be negative, and it is not invariant under continuous coordinate transformations). You can move to relative entropy to try to fix these problems, but this depends on a choice of an underlying measure m. What we see in both these cases is that the generalizations of entropy --- both differential and relative --- rely on some choice of a way to describe the underlying space (for differential, it is the choice of coordinate system, and for relative, the underlying measure m).
Thank you for this. Yes, the problem is that (in some cases) we think it can sometimes be difficult to specify what the probability distribution would be without the agent. One strategy would be to define some kind of counterfactual distribution that would obtain if there were no agent, but then we need to have some principled way to get this counterfactual (which might be possible). I think this is easier in situations in which the presence of an agent/optimizer is only one possibility, in which case we have a defined probability distribution, conditional on there not being an agent. Perhaps that is all that matters (I am somewhat partial to this), but then I don't think of this as giving us a definition of an optimizing system (since, conditional on their being an optimizing system, there would cease to be an optimizing system---for a similar idea, see Vingean Agency).
I like your suggestions for connecting (1) and (3).
And thanks for the correction!