User Comment Replies — AI Alignment Forum

Finite Factored Sets: Introduction and Factorizations

The part about Chimera functions was surprising, and I look forward to seeing where that will go, and to more of this in general.

In section 2.1 , Proposition 2 should presumably say that $\geq_{S}$ is a partial order on $Part (S)$ rather than on $S$ .

2Scott Garrabrant4y

Fixed, Thanks.

Finite Factored Sets

drocta4y20

You said that you thought that this could be done in a categorical way. I attempted something which appears to describe the same thing when applied to the category FinSet , but I'm not sure it's the sort of thing you meant by when you suggested that the combinatorial part could potentially be done in a categorical way instead, and I'm not sure that it is fully categorical.

Let S be an object.
For i from 1 to k, let $A_{i}$ be an object, (which is not anything isomorphic to the product of itself with itself, or at least is not the terminal object) .
Let&n... (read more)

5Scott Garrabrant4y

I have not thought much about applying to things other than finite sets. (I looked at infinite sets enough to know there is nontrivial work to be done.) I do think it is good that you are thinking about it, but I don't have any promises that it will work out. What I meant when I think that this can be done in a categorical way is that I think I can define a nice symmetric monodical category of finite factored sets such that things like orthogonality can be given nice categorical definitions. (I see why this was a confusing thing to say.)

Parsing Chris Mingard on Neural Networks

drocta4y20

This comment I'm writing is mostly because this prompted me to attempt to see how feasible it would be to computationally enumerate the conditions for the weights of small networks like the 2 input 2 hidden layer 1 output in order to implement each of the possible functions. So, I looked at the second smallest case by hand, and enumerated conditions on the weights for a 2 input 1 output no hidden layer perceptron to implement each of the 2 input gates, and wanted to talk about it. This did not result in any insights, so if that doesn't sound interesting, m... (read more)

2Alex Flint4y

Very very cool. Thank you for this drocta. What would it take to map out the sizes of the volumes corresponding to each of these mappings? Also, could you perhaps compute the exact Kolmogorov complexity of these mappings in some particular description language, since they are so small? It would be super interesting to me to assemble a table of volumes and Kolmogorov complexities for each of these small mappings. It may then be possible to write some code that does the same for 3-input and 4-input mappings.

Subagents of Cartesian Frames

drocta4y10

I am trying to check that I am understanding this correctly by applying it, though probably not in a very meaningful way:

Am I right in reasoning that, for $S \subseteq W$ , that $1_{S} ⊲ C$ iff ( (C can ensure S), and (every element of S is a result of a combination of a possible configuration of the environment of C with a possible configuration of the agent for C, such that the agent configuration is one that ensures S regardless of the environment configuration)) ?

So, if S = {a,b,c,d} , then
$C = ⎡ ⎢ ⎣ \begin{matrix} a & b & a c & d & d e & f & a \end{matrix} ⎤ ⎥ ⎦$

would have $1_{S} ⊲ C$ , but, say

$D = ⎡ ⎢ ⎢ ⎢$ ... (read more)

3Scott Garrabrant4y

Yep. There is a single morphism from 1S to ⊥ for every world in S, so 1S◃C means all of these morphism factor through C. A morphism from C to ⊥ is basically a column of C and a morphism from 1S to C is basically an row in C, all of whose entries are in S, and these compose to the morphism corresponding to the entry where this column meets this row. Thus 1S◃C if and only if when you delete all rows not entirely in S, the resulting matrix has image S. I think this equivalent to what you said. I just wrote it out myself because that was the easiest way for me to verify what you said.

The "best predictor is malicious optimiser" problem

drocta5y00

What came to mind for me before reading the spoiler-ed options, was a variation on #2, with the difference being that, instead of trying to extract P's hypothesis about B, we instead modify T to get a T' which has P replaced with a P' which is a paperclip minimizer instead of maximizer, and then run both, and only use the output when the two agree, or if they give probabilities, use the average, or whatever.

Perhaps this could have an advantage over #2 if it is easier to negate what P is optimizing for than to extract P's model of B. (ed... (read more)

1Donald Hobson5y

Thanks for a thoughtful comment. Assuming that P and P' are perfectly antialigned, they won't cooperate. However they need to be really antialigned for this to work. If there is some obscure borderline that P thinks is a paperclip, and P' thinks isn't, they can work together to tile the universe with it. I don't think it would bed that easy to change evolution into a reproductive fitness minimiser, or to negate a humans values. If P and P' are antialigned, then in the scenario where you only listen to them if they agree, then for any particular prediction, at least one of them will consider disagreeing better than that. The game theory is a little complicated, but they aren't being incentivised to report their predictions. Actually, A has to be able to manage, not only correct and competent adversaries, but deluded and half mad ones too. I think P would find it hard to be inscrutable. It is impossible to obfuscate arbitrary code. I agree with your final point. Though for any particular string X, the fastest turing machine to produce it is the one that is basically print(X) . This is why we use short TM's not just fast ones.

AI ALIGNMENT FORUM
AF

All of drocta's Comments + Replies