Technical staff at Anthropic (views my own), previously #3ainstitute; interdisciplinary, interested in everything, ongoing PhD in CS, bets tax bullshit, open sourcerer, more at zhd.dev
I agree with Rohin that there are approximately zero useful things that don't make anyone's workflow harder. The default state is "only just working means working, so I've moved on to the next thing" and if you want to change something there'd better be a benefit to balance the risk of breaking it.
Also 3% of compute is so much compute; probably more than the "20% to date over four years" that OpenAI promised and then yanked from superalignment. Take your preferred estimate of lab compute spending, multiply by 3%, and ask yourself whether a rushed unreasonable lab would grant that much money to people working on a topic it didn't care for, at the expense of those it did.
My impression is that few (one or two?) of the safety people who have quit a leading lab did so to protest poor safety policies, and of those few none saw staying as a viable option.
Relatedly, I think Buck far overestimates the influence and resources of safety-concerned staff in a 'rushed unreasonable developer'.
I think this is the most important statement on AI risk to date. Where ChatGPT brought "AI could be very capable" into the overton window, the CAIS Statement brought in AI x-risk. When I give talks to NGOs, or business leaders, or government officials, I almost always include a slide with selected signatories and the full text:
Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.
I believe it's true, that it was important to say, and that it's had an ongoing, large, and positive impact. Thank you again to the organizers and to my many, many co-signatories.
I further suggest that if using these defined terms, instead of including a table of definitions somewhere you include the actual probability range or point estimate in parentheses after the term. This avoids any need to explain the conventions, and makes it clear at the point of use that the author had a precise quantitative definition in mind.
For example: it's likely (75%) that flipping a pair of fair coins will get less than two heads, and extremely unlikely (0-5%) that most readers of AI safety papers are familiar with the quantitative convention proposed above - although they may (>20%) be familiar with the general concept. Note that the inline convention allows for other descriptions if they make the sentence more natural!
Without wishing to discourage these efforts, I disagree on a few points here:
Still, the biggest opportunities are often the ones with the lowest probability of success, and startups are the best structures to capitalize on them.
If I'm looking for the best expected value around, that's still monotonic in the probability of success! There are good reasons to think that most organizations are risk-averse (relative to the neutrality of linear $=utils) and startups can be a good way to get around this.
Nonetheless, I remain concerned about regressional Goodhart; and that many founders naively take on the risk appetite of funders who manage a portfolio, without the corresponding diversification (if all your eggs are in one basket, watch that basket very closely). See also Inadequate Equilibria and maybe Fooled by Randomness.
Meanwhile, strongly agreed that AI safety driven startups should be B corps, especially if they're raising money.
Technical quibble; "B Corp" is a voluntary private certification; PBC is a corporate form which imposes legal obligations on directors. I think many of the B Corp criteria are praiseworthy, but this is neither necessary nor sufficient as an alternative to PBC status - and getting certified is probably a poor use of time and attention for a startup when the founders' time and attention are at such a premium.
My personal opinion is that starting a company can be great, but I've also seen several fail due to the gaps between their personal goals, a work-it-out-later business plan, and the duties that you/your board owes to your investors.
IMO any purpose-driven company should be founded as a Public Benefit Corporation, to make it clear in advance and in law that you'll also consider the purpose and the interests of people materially affected by the company alongside investor returns. (cf § 365. Duties of directors)
The obvious targets are of course Anthropic's own frontier models, Claude Instant and Claude 2.
Problem setup: what makes a good decomposition? discusses what success might look like and enable - but note that decomposing models into components is just the beginning of the work of mechanistic interpretability! Even with perfect decomposition we'd have plenty left to do, unraveling circuits and building a larger-scale understanding of models.
One year is actually the typical term length for board-style positions, but because members can be re-elected their tenure is often much longer. In this specific case of course it's now up to the trustees!
I don't think Miles' or Richard's stated reasons for resigning included safety policies, for example.
But my broader point is that "fewer safety people should quit leading labs to protest poor safety policies" is basically a non-sequitor from "people have quit leading labs because they think they'll be more effective elsewhere", whether because they want to do something different or independent, or because they no longer trust the lab to behave responsibly.