All of Joe Benton's Comments + Replies

Here's an argument that I think works in the limit where . For very large , each subset  of size  will occur many times as an . Moreover, for each set of coodinate values , there will be many  such that  and . Therefore, using the law of large numbers for any subset  of size  and any set of coordinate values , Bob can infer , i.e. the number of inputs  that ag... (read more)

It seems to me like there are a couple of different notions of “being able to distinguish between mechanisms” we might want to use:

  1. There exists an efficient algorithm which when run on a model input  will output which mechanism model  uses when run on .
  2. There exists an efficient algorithm which when run on a model will output another programme  such that when we run  on a model input  it will output (in reasonable time) which mechanism  uses when run on .

In general, being ab... (read more)

2Paul Christiano
Your overall picture sounds pretty similar to mine. A few differences. * I don't think the literal version of (2) is plausible. For example, consider an obfuscated circuit. * The reason that's OK is that finding the de-obfuscation is just as easy as finding the obfuscated circuit, so if gradient descent can do one it can do the other. So I'm really interested in some modified version of (2), call it (2'). This is roughly like adding an advice string as input to P, with the requirement that the advice string is no harder to learn than M itself (though this isn't exactly right).  * I mostly care about (1) because the difficulty of (1) seems like the main reason to think that (2') is impossible. Conversely, if we understood why (1) was always possible it would likely give some hints for how to do (2). And generally working on an easier warm-up is often good. * I think that if (1) is false in general, we should be able to find some example of it. So that's a fairly high priority for me, given that this is a crucial question for the feasibility or character of the worst-case approach. * That said, I'm also still worried about the leap from (1) to (2'), and as mentioned in my other comment I'm very interested in finding a way to solve the harder problem in the case of primality testing.