(Crossposted from my blog)

In 2015 MIRI proposed quantilizers as a decision theory criterion that an AI may use that allows it strive towards a goal while not optimizing too hard at the goal, so that if a superintelligent were a quantilizer there's a smaller chance it will do something catastrophic like kill all humans and turn all matter on Earth into paperclips because it's trying really hard to maximize the productivity of a paperclip factory. For a real number , a -quantilizer is an agent which, given a goal and a space of possible strategies, picks randomly out of the top proportion , i.e., randomly among any strategy that is better than a proportion  of all strategies. In fact, a quantilizer is for all practical purposes equivalent to a a perfect optimizer (an agent that always picks the best of all its possible strategies) that is further restricted to have its strategy specifies by a string of bits whose length must be less than . Since the people at MIRI have already considered the possibility of controlling an AI by restricting its ability to perform input and output and have generally decided that they are not satisfied with this, I consider the idea of quantilizers to be a failure.

Roughly speaking, my claim is that for a natural number , up to a constant additive factor, a perfect optimizer that is limited to strategies taking at most  bits performs as well in any goal as a -quantilizer. More precisely, to explain what I mean by "up to a constant additive factor", there is a reasonably-sized constant  such that an optimizer with  bits of outputs performs as well as a -quantilizer, and a -quantilizer performs as well as an optimizer with  bits of output. I also need to assume that the space of strategies is given a rich enough encoding that we can do things like specify a strategy by specifying a computer program whose output is that strategy, or that the environment is rich enough that this can be done implicitly with strategies like "Walk to a computer and type this program, and attach to the computer the accessories necessary to run the strategy specified by the program." To really make the result true up to a constant factor and not a logarithmic factor or anything like that you need to careful about encoding using stuff like prefix-free encodings but I don't think that's really important and will gloss over it.

Quantilizers are at least as good as bounded-output optimizers: If a particular strategy is optimal among all strategies of length  bits, then since there are at most  such strategies this one optimizing strategy already makes up a proportion  of all strategies with  bits, so a -quantilizer over strategies with  bits must use exactly this strategy or an equally-performing strategy. If we assume the strategies have a measure biased towards short strategies or use a prefix-free encoding or a subset of the strategies are implicitly something like that, a proportion  of all strategies will have length , so a proportion  will be identical to this -bit optimal strategy, so a -quantilizer will perform at least as well as this strategy.

Bounded-output optimizers are at least as good as quantilizers: Using a deterministic pseudo-random number generator, it is possible to precisely specify a sequence  of strategies that are fairly sampled from the space of all strategies. A proportion  of these strategies are also in the top  of performance among all strategies. So the smallest  such that  is among the top  in performance is statistically certain to have  for a small constant . It takes  bits to specify the , plus an additional  bits to specify the pseudo-random sequence and the command to perform strategy  given . So an optimizer with  bits of output can specify the strategy  which is -quantilizing, so the actual optimizing strategy is at least as good.

New Comment