I don't think Miles' or Richard's stated reasons for resigning included safety policies, for example.
But my broader point is that "fewer safety people should quit leading labs to protest poor safety policies" is basically a non-sequitor from "people have quit leading labs because they think they'll be more effective elsewhere", whether because they want to do something different or independent, or because they no longer trust the lab to behave responsibly.
I agree with Rohin that there are approximately zero useful things that don't make anyone's workflow harder. The default state is "only just working means working, so I've moved on to the next thing" and if you want to change something there'd better be a benefit to balance the risk of breaking it.
Also 3% of compute is so much compute; probably more than the "20% to date over four years" that OpenAI promised and then yanked from superalignment. Take your preferred estimate of lab compute spending, multiply by 3%, and ask yourself whether a rushed unreasonable lab would grant that much money to people working on a topic it didn't care for, at the expense of those it did.
My impression is that few (one or two?) of the safety people who have quit a leading lab did so to protest poor safety policies, and of those few none saw staying as a viable option.
Relatedly, I think Buck far overestimates the influence and resources of safety-concerned staff in a 'rushed unreasonable developer'.
My impression is that few (one or two?) of the safety people who have quit a leading lab did so to protest poor safety policies, and of those few none saw staying as a viable option.
While this isn't amazing evidence, my sense is there have been around 6 people who quit who in-parallel to them announcing their leave called out OpenAI's reckless attitude towards risk (at various levels of explicitness, but quite strongly in all cases by standard professional norms).
It's hard to say that people quit "to protest safety policies", but they definitely used...
I think this is the most important statement on AI risk to date. Where ChatGPT brought "AI could be very capable" into the overton window, the CAIS Statement brought in AI x-risk. When I give talks to NGOs, or business leaders, or government officials, I almost always include a slide with selected signatories and the full text:
Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.
I believe it's true, that it was important to say, and that it's had an ongoing, large, and positive impact. Thank you again to the organizers and to my many, many co-signatories.
I further suggest that if using these defined terms, instead of including a table of definitions somewhere you include the actual probability range or point estimate in parentheses after the term. This avoids any need to explain the conventions, and makes it clear at the point of use that the author had a precise quantitative definition in mind.
For example: it's likely (75%) that flipping a pair of fair coins will get less than two heads, and extremely unlikely (0-5%) that most readers of AI safety papers are familiar with the quantitative convention prop...
Without wishing to discourage these efforts, I disagree on a few points here:
Still, the biggest opportunities are often the ones with the lowest probability of success, and startups are the best structures to capitalize on them.
If I'm looking for the best expected value around, that's still monotonic in the probability of success! There are good reasons to think that most organizations are risk-averse (relative to the neutrality of linear $=utils) and startups can be a good way to get around this.
Nonetheless, I remain concerned about regressional Good...
My personal opinion is that starting a company can be great, but I've also seen several fail due to the gaps between their personal goals, a work-it-out-later business plan, and the duties that you/your board owes to your investors.
IMO any purpose-driven company should be founded as a Public Benefit Corporation, to make it clear in advance and in law that you'll also consider the purpose and the interests of people materially affected by the company alongside investor returns. (cf § 365. Duties of directors)
I agree that there's no substitute for thinking about this for yourself, but I think that morally or socially counting "spending thousands of dollars on yourself, an AI researcher" as a donation would be an apalling norm. There are already far too many unmanaged conflicts of interest and trust-me-it's-good funding arrangements in this space for me, and I think it leads to poor epistemic norms as well as social and organizational dysfunction. I think it's very easy for donating to people or organizations in your social circle to have substantial negative ...
I think people who give up large amounts of salary to work in jobs that other people are willing to pay for from an impact perspective should totally consider themselves to have done good comparable to donating the difference between their market salary and their actual salary. This applies to approximately all safety researchers.
The obvious targets are of course Anthropic's own frontier models, Claude Instant and Claude 2.
Problem setup: what makes a good decomposition? discusses what success might look like and enable - but note that decomposing models into components is just the beginning of the work of mechanistic interpretability! Even with perfect decomposition we'd have plenty left to do, unraveling circuits and building a larger-scale understanding of models.
Example projects you're not allowed to do, if they involve other model families:
Llama 2 is not open source.
(a few days after this comment, here's a concurring opinion from the Open Source Initiative - as close to authoritative as you can get)
(later again, here's Yan LeCun testifying under oath: "so first of all Llama system was not made open source ... we released it in a way that did not authorize commercial use, we kind of vetted the people who could download the model it was reserved to researchers and academics")
While their custom licence permits some commercial uses, it is not an OSI approved license, and because it violates the ...
(Zac's note: I'm posting this on behalf of Jack Clark, who is unfortunately unwell today. Everything below is his words.)
Hi there, I’m Jack and I lead our policy team. The primary reason it’s not discussed in the post is that the post was already quite long and we wanted to keep the focus on safety - I did some help editing bits of the post and couldn’t figure out a way to shoehorn in stuff about policy without it feeling inelegant / orthogonal.
You do, however, raise a good point, in that we haven’t spent much time publicly explaining what we’re up t...
For Python basics, I have to anti-recommend Shaw's 'learn the hard way'; it's generally outdated and in some places actively misleading. And why would you want to learn the hard way instead of the best way in any case?
Instead, my standard recommendation is Al Sweigart's Automate the Boring Stuff and then Beyond the Basic Stuff (both readable for free on inventwithpython.com, or purchasable in books); he's also written some books of exercises. If you prefer a more traditional textbook, Think Python 2e is excellent and also available freely online.
Unfortunately, the interpretable-composition hypothesis is simply wrong! Many bugs come from 'feature interactions', a term coined by by Pamela Zave - and if it wasn't for that, programming would be as easy as starting from e.g. Lisp's or Forth's tiny sets of primitives.
As to a weaker form, well, yes - using a 'cleanroom' approach and investing heavily in formal verification (and testing) can get you an orders-of-magnitude lower error rate than ordinary software... at orders-of-magnitude greater cost. At more ordinarily-reasonable levels of rigor, I'd em...
First, an apology: I didn't mean this to be read as an attack or a strawman, nor applicable to any use of theorem-proving, and I'm sorry I wasn't clearer. I agree that formal specification is a valuable tool and research direction, a substantial advancement over informal arguments, and only as good as the assumptions. I also think that hybrid formal/empirical analysis could be very valuable.
Trying to state a crux, I believe that any plan which involves proving corrigibility properties about MuZero (etc) is doomed, and that safety proofs about simpler app...
I was halfway through a PhD on software testing and verification before joining Anthropic (opinions my own, etc), and I'm less convinced than Eliezer about theorem-proving for AGI safety.
There are so many independently fatal objections that I don't know how to structure this or convince someone who thinks it would work. I am therefore offering a $1,000 prize for solving a far easier problem:
...Take an EfficientNet model with >= 99% accuracy on MNIST digit classification. What is the largest possible change in the probability assigned to some class betw
You're attacking a strawman of what kind of theorems we want to prove. Obviously we are not going to prove theorems that contain specific datasets as part of the statement. What we're going to do is build a theory founded on certain assumptions about the real-world (such as locality / decoupling of dynamics on different scales / some kind of chaos / certain bounds on computational complexity / existence of simple fundamental laws etc) and humans (e.g. that they are approximately rational agents, for some definition thereof). Such a theory can produce many ...
High Impact Careers in Formal Verification: Artificial Intelligence
My research focuses on advanced testing and fuzzing tools, which are so much easier to use that people actually use them - eg in Pytorch, and I understand in Deepmind. If people seem interested I could write up a post on relevance to AI safety in a few weeks.
Core idea: even without proofs, writing out safety properties or other system invariants in code is valuable both (a) for deconfusion, and (b) because we can have a computer search for counterexamples using a variety of heuristics an...
I'm sorry that I don't have time to write up a detailed response to (critique of?) the response to critiques; hopefully this brief note is still useful.
I remain frustrated by GSAI advocacy. It's suited for well-understood closed domains, excluding e.g. natural language, when discussing feasibility; but 'we need rigorous guarantees for current or near-future AI' when arguing for importance. It's an extension to or complement of current practice; and current practice is irresponsible and inadequate. Often this is coming from different advocates, but th