User Comment Replies — AI Alignment Forum

$500 Bounty/Prize Problem: Channel Capacity Using "Insensitive" Functions

Here's an argument that I think works in the limit where $m \to \infty$ . For very large $m$ , each subset $S \subseteq [n]$ of size $s$ will occur many times as an $S_{i}$ . Moreover, for each set of coodinate values $x_{S} := (x_{j})_{j \in S}$ , there will be many $x^{i}$ such that $S_{i} = S$ and $x_{S}^{i} = x_{S}$ . Therefore, using the law of large numbers for any subset $S \subseteq [n]$ of size $s$ and any set of coordinate values $x_{S}$ , Bob can infer $N (x_{S}, f) := # {x^{'} \in {0, 1}^{n} : x_{S}^{'} = x_{S} and f (x^{'}) = 1}$ , i.e. the number of inputs $x^{'}$ that ag... (read more)

Can we efficiently distinguish different mechanisms?

Joe Benton2y20

It seems to me like there are a couple of different notions of “being able to distinguish between mechanisms” we might want to use:

There exists an efficient algorithm which when run on a model input $x$ will output which mechanism model $M$ uses when run on $x$ .
There exists an efficient algorithm which when run on a model $M$ will output another programme $P$ such that when we run $P$ on a model input $x$ it will output (in reasonable time) which mechanism $M$ uses when run on $x$ .

In general, being ab... (read more)

2Paul Christiano2y

Your overall picture sounds pretty similar to mine. A few differences. * I don't think the literal version of (2) is plausible. For example, consider an obfuscated circuit. * The reason that's OK is that finding the de-obfuscation is just as easy as finding the obfuscated circuit, so if gradient descent can do one it can do the other. So I'm really interested in some modified version of (2), call it (2'). This is roughly like adding an advice string as input to P, with the requirement that the advice string is no harder to learn than M itself (though this isn't exactly right). * I mostly care about (1) because the difficulty of (1) seems like the main reason to think that (2') is impossible. Conversely, if we understood why (1) was always possible it would likely give some hints for how to do (2). And generally working on an easier warm-up is often good. * I think that if (1) is false in general, we should be able to find some example of it. So that's a fairly high priority for me, given that this is a crucial question for the feasibility or character of the worst-case approach. * That said, I'm also still worried about the leap from (1) to (2'), and as mentioned in my other comment I'm very interested in finding a way to solve the harder problem in the case of primality testing.

AI ALIGNMENT FORUM
AF

All of Joe Benton's Comments + Replies