Linkpost for https://cims.nyu.edu/~sbowman/bowman2021hype.pdf. To appear on arXiv shortly.

I'm sharing a position paper I put together as an attempt to introduce general NLP researchers to AI risk concerns. From a few discussions at *ACL conferences, it seems like a pretty large majority of active researchers aren't aware of the arguments at all, or at least aren't aware that they have any connection to NLP and large language model work.

The paper makes a slightly odd multi-step argument to try to connect to active debates in the field:

  • It's become extremely common in NLP papers/talks to claim or imply that NNs are too brittle to use, that they aren't doing anything that could plausibly resemble language understanding, and that this is a pretty deep feature of NNs that we don't know how to fix. These claims sometimes come with evidence, but it's often bad evidence, like citations to failures in old systems that we've since improved upon significantly. Weirdly, this even happens in papers that themselves show positive results involving NNs.
  • This seems to be coming from concerns about real-world harms: Current systems are pretty biased, and we don't have great methods for dealing with that, so there's a pretty widely-shared feeling that we shouldn't be deploying big NNs nearly as often as we are. The reasoning seems to go: If we downplay the effectiveness of this technology, that'll discourage its deployment.
  • But is that actually the right way to minimize the risk of harms? We should expect the impacts of these technologies to grow dramatically as they get better—the basic AI risk arguments go here—and we'll need to be prepared for those impacts. Downplaying the progress that we're making, both to each other and to outside stakeholders, limits our ability to foresee potentially-impactful progress or prepare for it.

I'll be submitting this to ACL in a month. Comments/criticism welcome, here or privately (bowman@nyu.edu).

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 4:57 AM

I agree with the critiques you make of specific papers (in section 2), but I'm less convinced by your diagnosis that these papers are attempting to manage/combat hype in a misguided way.

IMO, "underclaiming" is ubiquitous in academic papers across many fields -- including fields unrelated to NLP or ML, and fields where there's little to no hype to manage.  Why do academics underclaim?  Common reasons include:

  1. An incentive to make the existing SOTA seem as bad as possible, to maximize the gap between it and your own new, sparkly, putatively superior method.
    Anyone who's read papers in ML, numerical analysis, statistical inference, computer graphics, etc. is familiar with this phenomenon; there's a reason this tweet is funny.
  2. An incentive to frame one's own work as solving a real, practically relevant problem which is not adequately addressed by existing approaches.  This is related to #1, but tends to affect motivating discussion, whereas #1 affects the presentation of results.
  3. General sloppiness about citations.  Academics rarely do careful background work on the papers they cite, especially once it becomes "conventional" to cite a particular paper in a particular context.  Even retracted papers often go on being cited year after year, with no mention made of the retraction.

I suspect 1+2+3 above, rather than hype management, explains the specific mistakes you discuss.

For example, Zhang et al 2020 seems like a case of #2.  They cite Jia and Liang as evidence about a problem with earlier models, a problem they are trying to solve with their new method.  It would be strange to "manage hype" by saying NLP systems can't do X, and then in the same breath present a new system which you claim does X! 

Jang and Lukasiewicz (2021) is also a case of #2, describing a flaw primarily in order to motivate their own proposed fix.

Meanwhile, Xu et al 2020 seems like #3: it's a broad review paper on "adversarial attacks" which gives a brief description of Jia and Liang 2017 alongside brief descriptions of many other results, many of them outside NLP.  It's true that the authors should not have used the word "SOTA" here, but it seems more plausible that this is mere sloppiness (they copied other, years-old descriptions of the Jia and Liang result) rather than an attempt to push a specific perspective about NLP.

I think a more useful framing might go something like:

  • We know very little about the real capabilities/limits of existing NLP systems.  The literature does not discuss this topic with much care or seriousness; people often cite outdated results, or attach undue significance to correct-but-narrow philosophical points about limitations.
  • This leads to some waste of effort, as people work on solving problems that have already been solved (like trying to "fix" Jia and Liang issues as if it were still 2017).  Note that this is a point NLP researchers ought to care about, whether they are interested in AI safety or not.
  • This is also bad from an AI safety perspective.
  • We should study the capabilities of existing systems, and the likely future trajectory of those capabilities, with more care and precision.

An incentive to make the existing SOTA seem as bad as possible, to maximize the gap between it and your own new, sparkly, putatively superior method.

Here's an eyerolling example from yesterday or so: Delphi boasts about their new ethics dataset of n=millions & model which gets 91% vs GPT-3 at chance-level of 52%. Wow, how awful! But wait, we know GPT-3 does better than chance on other datasets like Hendrycks's ETHICS, how can it do so bad where a much smaller model can do so well?

Oh, it turns out that that's zeroshot with their idiosyncratic format. The abstract just doesn't mention that when they do some basic prompt engineering (no p-tuning or self-distillation or anything) and include a few examples (ie. a lot fewer than 'millions'), it gets more like... 84%. Oh.

Yeah, this all sounds right, and it's fairly close to the narrative I was using for my previous draft, which had a section on some of these motives.

The best defense I can give of the switch to the hype-centric framing, FWIW:

  • The paper is inevitably going to have to do a lot of chastising of authors. Giving the most charitable possible framing of the motivations of the authors I'm chastising means that I'm less likely to lose the trust/readership of those authors and anyone who identifies with them.
  • An increasingly large fraction of NLP work—possibly even a majority now—is on the analysis/probing/datasets side rather than model development, and your incentives 1 and 2 don't apply as neatly there. There are still incentives to underclaim, but they work differently.
  • Practically, writing up that version clearly seemed to require a good deal more space, in an already long-by-ML-standards paper.

That said, I agree that this framing is a little bit too charitable, to the point of making implausible implications about some of these authors' motives in some cases, which isn't a good look. I also hadn't thought of the wasted effort point, which seems quite useful here. I'm giving a few talks about this over the next few weeks, and I'll workshop some tweaks to the framing with this in mind.

[-][anonymous]3y10

Some minor feedback points: Just from reading the abstract and intro, this could be read as a non-sequitur: "It limits our ability to mitigate short-term harms from NLP deployments". Also, calling something a "short-term" problem doesn't seem necessary and it may sound like you think the problem is not very important.

Thanks! Tentative rewrite for the next revision:

It harms our credibility in ways that can make it harder to mitigate present-day harms from NLP deployments. It also limits our ability to prepare for the potentially enormous impacts of more distant future advances.

I tried to stick to 'present-day' over 'short-term', but missed this old bit of draft text in the abstract. 

When I try to get the paper, I get a 404 error.

Thanks—fixed! (The sentence-final period got folded into the URL.)

The paper makes a slightly odd multi-step argument to try to connect to active debates in the field:

This comment is some quick feedback on those:

Weirdly, this even happens in papers that themselves to show positive results involving NNs.

 

citations to failures in old systems that we've since improved upon significantly.

Might not be a main point, but this could be padded out with an explanation of how something like that could be marginally better. Like adding:

"As opposed to explaining how that is relevant today, like:

[Old technique] had [problem]. As [that area] has matured [problem has been fixed in this way]. However [slower deployment]/[more humans in the loop]/[other fix] would have reduced [problems]. Using [these fixes]/not making them critical systems which is risky because _ can help ensure [this new area] which [has the same problem] and probably will for [time] until it matures, does not have the same problems [old area] did [for length of time]."

 

But is that actually the right way to minimize the risk of harms? We should expect [that]

  • Is there any empirical base which could be used to estimate this/provide information on improving things? Anything similar

We should expect the impacts of these technologies to grow dramatically as they get better

  • What if the impact grows dramatically as...they get deployed widely? Even if it it's a bad idea, it's widely done because it's popular/cool/a fad/etc.?
  • What approach would work best then?

Thanks! (Typo fixed.)

[Old technique] had [problem]...

For this point, I'm not sure how it fits into the argument. Could you say more?

Is there any empirical base...

Yeah, this is a missed opportunity that I haven't had the time/expertise to take on. There probably are comparable situations in the histories of other applied research fields, but I'm not aware of any good analogies. I suspect that a deep dive into some history-and-sociology-of-science literature would be valuable here.

What if the impact grows dramatically as...they get deployed widely? ...

I think this kind of discussion is already well underway within NLP and adjacent subfields like FaCCT. I don't have as much to add there.

(Weird meta-note: Are you aware of something unusual about how this comment is posted? I saw a notification for it, but I didn't see it in the comments section for the post itself until initially submitting this reply. I'm newish to posting on Lightcone forums...)

(Weird meta-note: Are you aware of something unusual about how this comment is posted? I saw a notification for it, but I didn't see it in the comments section for the post itself until initially submitting this reply. I'm newish to posting on Lightcone forums...)

Ah. When you say lightcone forums, what site are you on? What does the URL look like?


For this point, I'm not sure how it fits into the argument. Could you say more?

It's probably a tangent. The idea was:

1) Criticism is great.

2) Explaining how that could be improved is marginally better. (I then explained for that case* how citing 'old evidence' or 'old stuff' could still apply to new stuff. It was kind of a niche application of evidence though. If someone had a good reason for using the old evidence, elaborating on that reason might help.)

*In abstract terms - I didn't have any examples in mind.

Forum  

I can see the comment at the comment-specific AF permalink here:

https://www.alignmentforum.org/posts/RLHkSBQ7zmTzAjsio/nlp-position-paper-when-combatting-hype-proceed-with-caution?commentId=pSkdAanZQwyT4Xyit#pSkdAanZQwyT4Xyit

...but I can't see it among the comments at the base post URL here.

https://www.alignmentforum.org/posts/RLHkSBQ7zmTzAjsio/nlp-position-paper-when-combatting-hype-proceed-with-caution 

From my experience with the previous comment, I expect it'll appear at the latter URL once I reply?

[Old technique] had [problem]...

Ah, got it. That makes sense! I'll plan to say a bit more about when/how it makes sense to cite older evidence in cases like this.