User Comment Replies — AI Alignment Forum

METR: Measuring AI Ability to Complete Long Tasks

I meant coding in particular, I agree algorithmic progress is not 3x faster. I checked again just now with someone and they did indeed report 3x speedup for writing code, although said that the new bottleneck becomes waiting for experiments to run (note this is not obviously something that can be solved by greater automation, at least up until the point that AI is picking better experiments than humans).

METR: Measuring AI Ability to Complete Long Tasks

Jacob Steinhardt7d3-9

Research engineers I talk to already report >3x speedups from AI assistants. It seems like that has to be enough that it would be showing up in the numbers. My null hypothesis would be that programmer productivity is increasing exponentially and has been for ~2 years, and this is already being taken into account in the curves, and without this effect you would see a slower (though imo not massively slower) exponential.

(This would argue for dropping the pre-2022 models from the graph which I think would give slightly faster doubling times, on the order of 5-6 months if I had to eyeball.).

7Daniel Kokotajlo6d

I don't believe it. I don't believe that overall algorithmic progress is 3x faster. Maaaybe coding is 3x faster but that would maybe increase overall algo progress by like 30% idk. But also I don't think coding is really 3x faster on average for the things that matter.

elifland7d612

I agree with habryka that the current speedup is probably substantially less than 3x.

However, worth keeping in mind that if it were 3x for engineering the overall AI progress speedup would be substantially lower, due to (a) non-engineering activities having a lower speedup, (b) compute bottlenecks, (c) half of the default pace of progress coming from compute.

My null hypothesis would be that programmer productivity is increasing exponentially and has been for ~2 years, and this is already being taken into account in the curves, and without this effect you w

... (read more)

Oliver Habryka7d1224

Research engineers I talk to already report >3x speedups from AI assistants

Huh, I would be extremely surprised by this number. I program most days, in domains where AI assistance is particularly useful (frontend programming with relatively high churn), and I am definitely not anywhere near 3x total speedup. Maybe a 1.5x, maybe a 2x on good weeks, but definitely not a 3x. A >3x in any domain would be surprising, and my guess is generalization for research engineer code (as opposed to churn-heavy frontend development) is less.

METR: Measuring AI Ability to Complete Long Tasks

Jacob Steinhardt8d6-4

Doesn't the trend line already take into account the effect you are positing? ML research engineers already say they get significant and increasing productivity boosts from AI assistants and have been for some time. I think the argument you are making is double-counting this. (Unless you want to argue that the kink with Claude is the start of the super-exponential, which we would presumably get data on pretty soon).

Daniel Kokotajlo8d81

I indeed think that AI assistance has been accelerating AI progress. However, so far the effect has been very small, like single-digit percentage points. So it won't be distinguishable in the data from zero. But in the future if trends continue the effect will be large, possibly enough to more than counteract the effect of scaling slowing down, possibly not, we shall see.

Steering GPT-2-XL by adding an activation vector

Jacob Steinhardt2y10

Glad it was helpful!

Steering GPT-2-XL by adding an activation vector

Jacob Steinhardt2y3324

Hi Alex,

Let me first acknowledge that your write-up is significantly more thorough than pretty much all content on LessWrong, and that I found the particular examples interesting. I also appreciated that you included a related work section in your write-up. The reason I commented on this post and not others is because it's one of the few ML posts on LessWrong that seemed like it might teach me something, and I wish I had made that more clear before posting critical feedback (I was thinking of the feedback as directed at Oliver / Raemon's moderation norms, ... (read more)

Alex Turner2y50

Thanks so much, I really appreciate this comment. I think it'll end up improving this post/the upcoming paper.

(I might reply later to specific points)

Steering GPT-2-XL by adding an activation vector

Jacob Steinhardt2y714

I'll just note that I, like Dan H, find it pretty hard to engage with this post because I can't tell whether it's basically the same as the Ludwig Schmidt paper (my current assumption is that it is). The paragraph the authors added didn't really help in this regard.

I'm not sure what you mean about whether the post was "missing something important", but I do think that you should be pretty worried about LessWrong's collective epistemics that Dan H is the only one bringing this important point up, and that rather than being rewarded for doing so or engaged w... (read more)

Alex Turner2y1314

I, like Dan H, find it pretty hard to engage with this post because I can't tell whether it's basically the same as the Ludwig Schmidt paper (my current assumption is that it is). The paragraph the authors added didn't really help in this regard.

The answer is: No, our work is very different from that paper. Here's the paragraph in question:

Editing Models with Task Arithmetic explored a "dual" version of our activation additions. That work took vectors between weights before and after finetuning on a new task, and then added or subtracted task-specific weig

... (read more)

Buck's Shortform

Jacob Steinhardt3y110

Yup, I agree with this, and think the argument generalizes to most alignment work (which is why I'm relatively optimistic about our chances compared to some other people, e.g. something like 85% p(success), mostly because most things one can think of doing will probably be done).

It's possibly an argument that work is most valuable in cases of unexpectedly short timelines, although I'm not sure how much weight I actually place on that.

More Is Different for AI

Jacob Steinhardt3y10

Yup! That sounds great :)

1Ruben Bloom3y

Here it is! https://www.lesswrong.com/s/4aARF2ZoBpFZAhbbe You might want to edit the description and header image.

More Is Different for AI

Jacob Steinhardt3y10

Thanks Ruby! Now that the other posts are out, would it be easy to forward-link them (by adding links to the italicized titles in the list at the end)?

1Ruben Bloom3y

We can also make a Sequence. I assume "More Is Different for AI" should be the title of the overall Sequence too?

1Ruben Bloom3y

Done!

Worst-case thinking in AI alignment

Jacob Steinhardt3y10

Finding the min-max solution might be easier, but what we actually care about is an acceptable solution. My point is that the min-max solution, in most cases, will be unacceptably bad.

And in fact, since min_x f(theta,x) <= E_x[f(theta,x)], any solution that is acceptable in the worst case is also acceptable in the average case.

2davidad (David A. Dalrymple)3y

Agreed—although optimizing for the worst case is usually easier than optimizing for the average case, satisficing for the worst case is necessarily harder (and, in ML, typically impossible) than satisficing for the average case.

Worst-case thinking in AI alignment

Jacob Steinhardt3y40

Thanks! I appreciated these distinctions. The worst-case argument for modularity came up in a past argument I had with Eliezer, where I argued that this was a reason for randomization (even though Bayesian decision theory implies you should never randomize). See section 2 here: The Power of Noise.

Re: 50% vs. 10% vs. 90%. I liked this illustration, although I don't think your argument actually implies 50% specifically. For instance if it turns out that everyone else is working on the 50% worlds and no one is working on the 90% worlds, you should probably wo... (read more)

Worst-case thinking in AI alignment

Jacob Steinhardt3y80

I think this probably depends on the field. In machine learning, solving problems under worst-case assumptions is usually impossible because of the no free lunch theorem. You might assume that a particular facet of the environment is worst-case, which is a totally fine thing to do, but I don't think it's correct to call it the "second-simplest solution", since there are many choices of what facet of the environment is worst-case.

One keyword for this is "partial specification", e.g. here is a paper I wrote that makes a minimal set of statistical assumptions... (read more)

2Paul Christiano3y

Even in ML it seems like it depends on how you formulated your problem/goal. Making good predictions in the worst case is impossible, but achieving low regret in the worst case is sensible. (Though still less useful than just "solve existing problems and then try the same thing tomorrow," and generally I'd agree "solve an existing problem for which you can verify success" is the easiest thing to do.) Hopefully having your robot not deliberately murder you is a similarly sensible goal in the worst case though it remains to be seen if it's feasible.

2davidad (David A. Dalrymple)3y

My interpretation of the NFL theorems is that solving the relevant problems under worst-case assumptions is too easy, so easy it's trivial: a brute-force search satisfies the criterion of worst-case optimality. So, that being settled, in order to make progress, we have to step up to average-case evaluation, which is harder. (However, I agree that once we already need to do some averaging, making explicit and stripping down the statistical assumptions and trying to get closer to worst-case guarantees—without making the problem trivial again—is harder than just evaluating empirically against benchmarks.)

Understanding and controlling auto-induced distributional shift

Jacob Steinhardt3y20

Cool paper! One brief comment is this seems closely related to performative prediction and it seems worth discussing the relationship.

Edit: just realized this is a review, not a new paper, so my comment is a bit less relevant. Although it does still seem like a useful connection to make.

2David Scott Krueger3y

author here -- Yes we got this comment from reviewers in the most recent round as well. ADS is a bit more general than performative prediction, since it applies outside of prediction context. Still very closely related. On the other hand, The point of our work is something that people in the performative prediction community seem to only slowly be approaching, which is the incentive for ADS. Work on CIDs is much more related in that sense. As a historical note: We starting working on this March or April 2018; Performative prediction was on arXiv Feb 2020, ours was at a safety workshop in mid 2019, but not on arXiv until Sept 2020.

Yudkowsky and Christiano discuss "Takeoff Speeds"

Jacob Steinhardt3y200

My basic take is that there will be lots of empirical examples where increasing model size by a factor of 100 leads to nonlinear increases in capabilities (and perhaps to qualitative changes in behavior). On median, I'd guess we'll see at least 2 such examples in 2022 and at least 100 by 2030.

At the point where there's a "FOOM", such examples will be commonplace and happening all the time. Foom will look like one particularly large phase transition (maybe 99th percentile among examples so far) that chains into more and more. It seems possible (though not c... (read more)

Discussion with Eliezer Yudkowsky on AGI interventions

Jacob Steinhardt3y90

Thanks. For time/brevity, I'll just say which things I agree / disagree with:

> sufficiently capable and general AI is likely to have property X as a strong default [...]

I generally agree with this, although for certain important values of X (such as "fooling humans for instrumental reasons") I'm probably more optimistic than you that there will be a robust effort to get not-X, including by many traditional ML people. I'm also probably more optimistic (but not certain) that those efforts will succeed.

[inside view, modest epistemology]: I don't have... (read more)

Eliezer Yudkowsky3y170

I'm not (retroactively in imaginary prehindsight) excited by this problem because neither of the 2 possible answers (3 possible if you count "the same") had any clear-to-my-model relevance to alignment, or even AGI. AGI will have better OOD generalization on capabilities than current tech, basically by the definition of AGI; and then we've got less-clear-to-OpenPhil forces which cause the alignment to generalize more poorly than the capabilities did, which is the Big Problem. Bigger models generalizing better or worse doesn't say anything obvio... (read more)

Discussion with Eliezer Yudkowsky on AGI interventions

Jacob Steinhardt3y150

Not sure if this helps, and haven't read the thread carefully, but my sense is your framing might be eliding distinctions that are actually there, in a way that makes it harder to get to the bottom of your disagreement with Adam. Some predictions I'd have are that:

* For almost any experimental result, a typical MIRI person (and you, and Eliezer) would think it was less informative about AI alignment than I would.
* For almost all experimental results you would think they were so much less informative as to not be worthwhile.
* There's a sma... (read more)

Rob Bensinger3y*210

I would agree with you that "MIRI hates all experimental work" / etc. is not a faithful representation of this state of affairs, but I think there is nevertheless an important disagreement MIRI has with typical ML people, and that the disagreement is primarily about what we can learn from experiments.

Ooh, that's really interesting. Thinking about it, I think my sense of what's going on is (and I'd be interested to hear how this differs from your sense):

Compared to the average alignment researcher, MIRI tends to put more weight on reasoning like 'sufficient

Jacob Steinhardt4y30

Thanks, sounds good to me!

Experimentally evaluating whether honesty generalizes

Jacob Steinhardt4y10

This might not matter as much if you're actually outputting explanations and not just translating from one language to another. Although it is probably true that for tasks that are far away from the ceiling, "naive objective + 10x larger model" will outperform "correct objective".

2Paul Christiano4y

I do expect "explanations of what's going on in this sentence" to be a lot weaker than translations. For that task, I expect that the model trained on coherence + similar tasks will outperform a 10x larger pre-trained model. If the larger pre-trained model gets context stuffing on similar tasks, but no coherence training, then it's less clear to me. But I guess the point is that the differences between various degrees of successful-generalization will be relatively small compared to model size effects. It doesn't matter so much how good the transfer model is relative to the pre-trained baseline, it matters how large the differences between the possible worlds that we are hoping to distinguish are. I guess my main hope there is to try to understand whether there is some setting where transfer works quite well, either getting very close to the model fine-tuned on distribution, or at least converging as the pre-trained model grows. Hopefully that will make it easier to notice the effects we are looking for, and it's OK if those effects are small relative to model doublings. (Also worth noting that "as good as increasing model size by 10%" is potentially quite economically relevant. So I'm mostly just thinking about the extent to which it can make effects hard to measure.)

Experimentally evaluating whether honesty generalizes

Jacob Steinhardt4y50

Thanks Paul, I generally like this idea.

Aside from the potential concerns you bring up, here is the most likely way I could see this experiment failing to be informative: rather than having checks and question marks in your tables above, really the model's ability to solve each task is a question of degree--each table entry will be a real number between 0 and 1. For, say, tone, GPT-3 probably doesn't have a perfect model of tone, and would get <100% performance on a sentiment classification task, especially if done few-shot.

The issue, then, is that the ... (read more)

2Paul Christiano4y

Part of my hope is that "coherence" can do quite a lot of the "telling you what humans mean about tone." For example, you can basically force the model to talk (in English) about what things contribute to tone, and why it thinks the tone is like such and such (or even what the tone of English sentences is)---anything that a human who doesn't know French can evaluate. And taken together those things seem like enough to mostly pin down what we are talking about. I'd tentatively interpret that as a negative result, but I agree with your comments below that ultimately a lot of what we care about here is the scaling behavior and putting together a more holistic picture of what's going on, in particular: * As we introduce stronger coherence checks, what happens to the accuracy? Is it approaching the quality of correctness, or is it going to asymptote much lower? * Is the gap shrinking as model quality improves, or growing? Do we think that very large models would converge to a small gap or is it a constant? I'm also quite interested in the qualitative behavior. Probably most interesting are the cases where the initial model is incoherent, the coherence-tuned model is coherent-but-wrong, and the correctness-tuned model is correct. (Of course every example is also fuzzy because of noise from sampling and training, but the degree of fuzziness is smaller as we remove randomness.) In these cases, what is happening with the coherence-tuned model? Are we able to see cases where it cleanly feels like the "wrong" generalization, or is it a plausible ambiguity about what we were looking for? And so on. I'm interested in the related engineering question: in this setting, what can we do to improve the kind of generalization we get? Can we get some handle on the performance gap and possible approaches to closing it? And finally I'm interested in understanding how the phenomenon depends on the task: is it basically similar in different domains / for different kinds of question o

1Jacob Steinhardt4y

Actually, another issue is that unsupervised translation isn't "that hard" relative to supervised translation--I think that you can get pretty far with simple heuristics, such that I'd guess making the model 10x bigger matters more than making the objective more aligned with getting the answer right (and that this will be true for at least a couple more 10x-ing of model size, although at some point the objective will matter more). This might not matter as much if you're actually outputting explanations and not just translating from one language to another. Although it is probably true that for tasks that are far away from the ceiling, "naive objective + 10x larger model" will outperform "correct objective".

AI x-risk reduction: why I chose academia over industry

Jacob Steinhardt4y30

This doesn't seem so relevant to capybaralet's case, given that he was choosing whether to accept an academic offer that was already extended to him.

AI ALIGNMENT FORUM
AF

All of jsteinhardt's Comments + Replies