User Comment Replies — AI Alignment Forum

Hmmm, I suspect that when most people say things like "the reward function should be a human-aligned objective," they're intending something more like "the reward function is one for which any reasonable learning process, given enough time/data, would converge to an agent that ends up with human-aligned objectives," or perhaps the far weaker claim that "the reward function is one for which there exists a reasonable learning process that, given enough time/data, will converge to an agent that ends up with human-aligned objectives."

2Alex Turner2y

Maybe! I think this is how Evan explicitly defined it for a time, a few years ago. I think the strong claim isn't very plausible, and the latter claim is... misdirecting of attention, and maybe too weak. Re: attention, I think that "does the agent end up aligned?" gets explained by the dataset more than by the reward function over e.g. hypothetical sentences. I think "reward/reinforcement numbers" and "data points" are inextricably wedded. I think trying to reason about reward functions in isolation is... a caution sign? A warning sign?

Maze-solving agents: Add a top-right vector, make the agent go to the top-right

Aryan Bhatt2y20

I wish I knew why.

Same.

I don't really have any coherent hypotheses (not that I've tried for any fixed amount of time by the clock) for why this might be the case. I do, however, have a couple of vague suggestions for how one might go about gaining slightly more information that might lead to a hypothesis, if you're interested.

The main one involves looking at the local nonlinearities of the few layers after the intervention layer at various inputs, by which I mean examining diff(t) = f(input+t*top_right_vec) - f(input) as a function of t (for small values o... (read more)

2Alex Turner2y

The current layer was chosen because I looked at all the layers for the cheese vector, and the current layer is the only one (IIRC) which produced interesting/good results. I think the cheese vector doesn't really work at other layers, but haven't checked recently.

Brief Notes on Transformers

Aryan Bhatt2y00

$W_{Q}^{T} W_{K} / d_{k}$

Sorry for the pedantic comment, but I think you might've meant to have $\sqrt{d_{k}}$ in the denominator here.

1Adam Jermyn2y

Ah that's right. Will edit to fix.

Toy Models and Tegum Products

Aryan Bhatt2y00

Thanks for the great post! I have a question, if it's not too much trouble:

Sorry for my confusion about something so silly, but shouldn't the following be "when $α ⩽ 2$ "?

When $α \geq 2$ there is no place where the derivative vanishes

I'm also a bit confused about why we can think of $α$ as representing "which moment of the interference distribution we care about."

Perhaps some of my confusion here stems from the fact that it seems to me that the optimal number of subspaces, $k = n e^{α / (2 - α)}$ , is an increasing function of $α$ , which ... (read more)

1Adam Jermyn2y

Oh you're totally right. And k=1 should be k=d there. I'll edit in a fix. It's not precisely which moment, but as we vary α the moment(s) of interest vary monotonically. This comment turned into a fascinating rabbit hole for me, so thank you! It turns out that there is another term in the Johnson-Lindenstrauss expression that's important. Specifically, the relation between ϵ, m, and D should be ϵ2/2−ϵ3/3≥4logm/D (per Scikit and references therein). The numerical constants aren't important, but the cubic term is, because it means the interference grows rather faster as m grows (especially in the vicinity of ϵ≈1). With this correction it's no longer feasible to do things analytically, but we can still do things numerically. The plots below are made with n=105,d=104: The top panel shows the normalized loss for a few different α≤2, and the lower shows the loss derivative with respect to k. Note that the range of k is set by the real roots of ϵ2/2−ϵ3/3≥4logm/D: for larger k there are no real roots, which corresponds to the interference ϵ crossing unity. In practice this bound applies well before k→d. Intuitively, if there are more vectors than dimensions then the interference becomes order-unity (so there is no information left!) well before the subspace dimension falls to unity. Anyway, all of these curves have global minima in the interior of the domain (if just barely for α=0.5), and the minima move to the left as α rises. That is, for α≤2 we care increasingly about higher moments as we increase α and so we want fewer subspaces. What happens for α>2? The global minima disappear! Now the optimum is always k=1. In fact though the transition is no longer at α=2 but a little higher: So the basic story still holds, but none of the math involved in finding the optimum applies! I'll edit the post to make this clear.

AI ALIGNMENT FORUM
AF

All of Aryan Bhatt's Comments + Replies