Technical staff at Anthropic (views my own), previously #3ainstitute; interdisciplinary, interested in everything, ongoing PhD in CS, bets tax bullshit, open sourcerer, more at zhd.dev
Without wishing to discourage these efforts, I disagree on a few points here:
Still, the biggest opportunities are often the ones with the lowest probability of success, and startups are the best structures to capitalize on them.
If I'm looking for the best expected value around, that's still monotonic in the probability of success! There are good reasons to think that most organizations are risk-averse (relative to the neutrality of linear $=utils) and startups can be a good way to get around this.
Nonetheless, I remain concerned about regressional Goodhart; and that many founders naively take on the risk appetite of funders who manage a portfolio, without the corresponding diversification (if all your eggs are in one basket, watch that basket very closely). See also Inadequate Equilibria and maybe Fooled by Randomness.
Meanwhile, strongly agreed that AI safety driven startups should be B corps, especially if they're raising money.
Technical quibble; "B Corp" is a voluntary private certification; PBC is a corporate form which imposes legal obligations on directors. I think many of the B Corp criteria are praiseworthy, but this is neither necessary nor sufficient as an alternative to PBC status - and getting certified is probably a poor use of time and attention for a startup when the founders' time and attention are at such a premium.
My personal opinion is that starting a company can be great, but I've also seen several fail due to the gaps between their personal goals, a work-it-out-later business plan, and the duties that you/your board owes to your investors.
IMO any purpose-driven company should be founded as a Public Benefit Corporation, to make it clear in advance and in law that you'll also consider the purpose and the interests of people materially affected by the company alongside investor returns. (cf § 365. Duties of directors)
I agree that there's no substitute for thinking about this for yourself, but I think that morally or socially counting "spending thousands of dollars on yourself, an AI researcher" as a donation would be an apalling norm. There are already far too many unmanaged conflicts of interest and trust-me-it's-good funding arrangements in this space for me, and I think it leads to poor epistemic norms as well as social and organizational dysfunction. I think it's very easy for donating to people or organizations in your social circle to have substantial negative expected value.
I'm glad that funding for AI safety projects exists, but the >10% of my income I donate will continue going to GiveWell.
The obvious targets are of course Anthropic's own frontier models, Claude Instant and Claude 2.
Problem setup: what makes a good decomposition? discusses what success might look like and enable - but note that decomposing models into components is just the beginning of the work of mechanistic interpretability! Even with perfect decomposition we'd have plenty left to do, unraveling circuits and building a larger-scale understanding of models.
One year is actually the typical term length for board-style positions, but because members can be re-elected their tenure is often much longer. In this specific case of course it's now up to the trustees!
Example projects you're not allowed to do, if they involve other model families:
In many cases I expect that individuals will go ahead and do this anyway, much like the license of Llama 1 was flagrantly violated all over the place, but remember that it's differentially risky for any organisation which Meta might like to legally harass.
Llama 2 is not open source.
(a few days after this comment, here's a concurring opinion from the Open Source Initiative - as close to authoritative as you can get)
(later again, here's Yan LeCun testifying under oath: "so first of all Llama system was not made open source ... we released it in a way that did not authorize commercial use, we kind of vetted the people who could download the model it was reserved to researchers and academics")
While their custom licence permits some commercial uses, it is not an OSI approved license, and because it violates the open source definition it never will be. Specifically, the llama 2 licence violates:
So, why does this matter?
As an open-source maintainer and PSF Fellow, I have no objection to the existence of commercially licensed software. I use much of it, and have sold commercial licenses for software that I've written too. However, people - and especially megacorps - misrepresenting their closed-off projects as open source is an infuriating form of parasitism on a reputation painstakingly built over decades.
The restriction on model training makes Llama 2 much less useful for AI safety research, but it incurs just as much direct (roughly all via misuse IMO) and acceleration risk as an open-source release.
Using a custom license adds substantial legal risk for prospective commercial users, especially given the very broad restrictions imposed by the acceptable use policy. This reduces the economic upside enormously relative to standard open terms, and leaves Meta's competitors particularly at risk of lawsuits if they attempt to use Llama 2.
To summarize, Meta gets a better cost/benefit tradeoff by using a custom, non-open-source license especially if people incorrectly percieve it as open source; everyone else is worse off; and it seems to me like they're deliberately misrepresenting what they've done for their own gain. This really, really annoys me.
When someone describes Llama 2 as "open source", please correct them: Meta is offering a limited commercial license which discriminates against specific users and bans many valuable use-cases, including in alignment research.
(Zac's note: I'm posting this on behalf of Jack Clark, who is unfortunately unwell today. Everything below is his words.)
Hi there, I’m Jack and I lead our policy team. The primary reason it’s not discussed in the post is that the post was already quite long and we wanted to keep the focus on safety - I did some help editing bits of the post and couldn’t figure out a way to shoehorn in stuff about policy without it feeling inelegant / orthogonal.
You do, however, raise a good point, in that we haven’t spent much time publicly explaining what we’re up to as a team. One of my goals for 2023 is to do a long writeup here. But since you asked, here’s some information:
You can generally think of the Anthropic policy team as doing three primary things:
More broadly, we try to be transparent on the micro level, but haven’t invested yet in being transparent on the macro. What I mean by that is many of our RFIs, talks, and ideas are public, but we haven’t yet done a single writeup that gives an overview of our work. I am hoping to do this with the team this year!
Some other desiderata that may be useful:
Our wonderful colleagues on the ‘Societal Impacts’ team led this work on Red Teaming and we (Policy) helped out on the paper and some of the research. We generally think red teaming is a great idea to push to policymakers re AI systems; it’s one of those things that is ‘shovel ready’ for the systems of today but, we think, has some decent chance of helping out in future with increasingly large models.
For Python basics, I have to anti-recommend Shaw's 'learn the hard way'; it's generally outdated and in some places actively misleading. And why would you want to learn the hard way instead of the best way in any case?
Instead, my standard recommendation is Al Sweigart's Automate the Boring Stuff and then Beyond the Basic Stuff (both readable for free on inventwithpython.com, or purchasable in books); he's also written some books of exercises. If you prefer a more traditional textbook, Think Python 2e is excellent and also available freely online.
I further suggest that if using these defined terms, instead of including a table of definitions somewhere you include the actual probability range or point estimate in parentheses after the term. This avoids any need to explain the conventions, and makes it clear at the point of use that the author had a precise quantitative definition in mind.
For example: it's likely (75%) that flipping a pair of fair coins will get less than two heads, and extremely unlikely (0-5%) that most readers of AI safety papers are familiar with the quantitative convention proposed above - although they may (>20%) be familiar with the general concept. Note that the inline convention allows for other descriptions if they make the sentence more natural!