User Comment Replies — AI Alignment Forum

Steering GPT-2-XL by adding an activation vector

p.b.2y30

This is really cool work! Congratulations!

Besides the LLM related work it also reminds somewhat of dynamic prompting in Stable Diffusion, where part of the prompt is changed after a number of steps to achieve a mixture of promp1 and prompt2.

What's the TL;DR for the Vicuna 13B experiments?

5Alex Turner2y

Activation additions work on Vicuna-13B about as well as they work on GPT-2-XL, or perhaps slightly better. GPT-J-6B is harder to work with for some reason.

Who models the models that model models? An exploration of GPT-3's in-context model fitting ability

p.b.3y20

This is a t-SNE I made a couple of years ago of the glove-wordvectors for numbers. So it's not surprising that there is a "number sense", though I am definitely surprised how good some of the results are.

Fun fact: Fitting the Iris dataset with a tiny neural network can be suprisingly fickle.

AGI Ruin: A List of Lethalities

p.b.3y98

The point I'm making is that the human example tells us that:

If first we realize that we can't code up our values, therefore alignment is hard. Then, when we realize that mesa-optimisation is a thing. we shouldn't update towards "alignment is even harder". We should update in the opposite direction.

Because the human example tells us that a mesa-optimiser can reliably point to a complex thing even if the optimiser points to only a few crude things.

But I only ever see these three points, human example, inability to code up values, mesa-optimisation to separately argue for "alignment is even harder than previously thought". But taken together that is just not the picture.

Eliezer Yudkowsky3y67

Humans point to some complicated things, but not via a process that suggests an analogous way to use natural selection or gradient descent to make a mesa-optimizer point to particular externally specifiable complicated things.

AGI Ruin: A List of Lethalities

p.b.3y910

Humans don't explicitly pursue inclusive genetic fitness; outer optimization even on a very exact, very simple loss function doesn't produce inner optimization in that direction.

Humans haven't been optimized to pursue inclusive genetic fitness for very long, because humans haven't been around for very long. Instead they inherited the crude heuristics pointing towards inclusive genetic fitness from their cognitively much less sophisticated predecessors. And those still kinda work!

If we are still around in a couple of million years I wouldn't be surpris... (read more)

0Rob Bensinger3y

The OP isn't claiming that alignment is impossible. I don't understand the point you're making here.

AI Tracker: monitoring current and near-future risks from superscale models

p.b.3y70

Much better now!

The date published vs date trained was on my mind because of Gopher. It seemed to me very relevant,that Deepmind trained a significantly larger model within basically half a year of the publication of GPT-3.

In addition to google brain also being quite coy about their 100+B model it made me update a lot in the direction of "the big players will replicate any new breakthrough very quickly but not necessarily talk about it."

To be clear, I also think it probably doesn't make sense to include this information in the list, because it is too rarely relevant.

gwern3y*70

It's worth noting that aside from the ridiculous situation where Googlers aren't allowed to name LaMDA (despite at least 5 published papers so far), Google has been very coy about MUM & Pathways (to the point where I'm still not sure if 'Pathways' is an actual model that exists, or merely an aspirational goal/name of a research programme). You also have the situation where models like LG's new 300b Exaone is described in a research paper which makes no mention of Exaone (the Korean coverage briefly mentions the L-Verse arch, but none of the English cov... (read more)

3Edouard Harris3y

Yeah, great point about Gopher, we noticed the same thing and included a note to that effect in Gopher's entry in the tracker. I agree there's reason to believe this sort of delay could become a bigger factor in the future, and may already be a factor now. If we see this pattern develop further (and if folks start publishing "model cards" more consistently like DM did, which gave us the date of Gopher's training) we probably will begin to include training date as separate from publication date. But for now, it's a possible trend to keep an eye on. Thanks again!

AI Tracker: monitoring current and near-future risks from superscale models

p.b.3y30

Some ideas for improvements:

The ability to sort by model size etc would be nice. Currently sorting is alphabetical.

Also the rows with long textual information should be more to the right and the more informative/tighter/numerical columns more to the left (like "deep learning" in almost all rows, not very informative). Ideally the most relevant information would be on the initial page without scrolling.

"Date published" and "date trained" can be quite different. Maybe worth including the latter?

4Edouard Harris3y

Thanks so much for the feedback! Right now the default sort is actually chronological by publication date. I just added the ability to sort by model size and compute budget at your suggestion. You can use the "⇅ Sort" button in the Models tab to try it out; the rows should now sort correctly. You are absolutely right! I've just taken a shot at rearranging the columns to surface the most relevant parts up front and played around a bit with the sizing. Let me know what you think. That's true, though I've found the date at which a model was trained usually isn't disclosed as part of a publication (unlike parameter count and, to a lesser extent, compute cost). There is also generally an incentive to publish fairly soon after the model's been trained and characterized, so you can often rely on the model not being that stale, though that isn't universal. Is there a particular reason you'd be interested in seeing training dates as opposed to (or in addition to) publication dates? Thanks again!

AI ALIGNMENT FORUM
AF

All of p.b.'s Comments + Replies