First off: as one of Ethan's current Astra Fellows (and having worked with him since ~last October) I especially think his collaborators in MATS and Astra historically underweight how valuable overcommunicating with Ethan is, and routinely underbook meetings to ask for his support.
Second, I think this post is so dense with useful advice, so I made anki flashcards of Ethan's post using GPT-4 (generated via ankibrain [https://ankiweb.net/shared/info/1915225457] , small manual edits.)
You can find them here: https://drive.google.com/file/d/1G4i7iZbILwAiQ7FtasSoLx5g7JIOWgeD/view?usp=sharing
For highly empirical research, it’s critical to get quick feedback and iterate on ideas rapidly. Jacob Steinhardt has a great blog post describing that a really good strategy for doing research is to “reduce uncertainty at the fastest possible rate”
Michael Bernstein's slides on velocity are a great resource for learning this mindset this as well. I particularly like his metaphor of the "swamp". This is the place you get stuck when you really want technique X to work for the project to progress, but none of the ways that you've tried applying it have succeeded. The solution is to have high velocity: that is, to test out as many ideas as possible per unit time until you get out the swamp. Other highlights of the slide deck include the focus on answering questions rather than doing engineering, and the related core-periphery distinction between things that are strictly needed to answer a question & those that can be ignored/mocked up/replaced for testing (which echoes the ideas in the "workflow" section of this post).
(Although they're similar, I'd argue that Michael's approach is easier to apply to empirical alignment research than Jacob's "stochastic decision process" approach. That's because falsifying abstract research ideas in empirical deep learning is hard (impossible?), and you don't get much generalizable knowledge from failing to get one idea to work. The real aim is to find one deep insight that does generalize—hence the focus on trying many distinct approaches.)
Yeah, I think this is one of the ways that velocity is really helpful. I'd probably add one caveat specific to research on LLMs, which is that, since the field/capabilities are moving so quickly, there's much, much more low-hanging fruit in empirical research than almost any other field of research. This means that, for LLM research specifically, you should rarely be in a swamp, because that means that you've probably run through the low-hanging fruit on that problem/approach, and there's other low-hanging in other areas that you probably want to be picking instead.
(High velocity is great for both picking low-hanging fruit and for getting through swamps when you really need to solve a particular problem, so it's useful to have either way)
TLDR: I’ve collected some tips for research that I’ve given to other people and/or used myself, which have sped things up and helped put people in the right general mindset for empirical AI alignment research. Some of these are opinionated takes, also around what has helped me. Researchers can be successful in different ways, but I still stand by the tips here as a reasonable default.
What success generally looks like
Here, I’ve included specific criteria that strong collaborators of mine tend to meet, with rough weightings on the importance, as a rough north star for people who collaborate with me (especially if you’re new to research). These criteria are for the specific kind of research I do (highly experimental LLM alignment research, excluding interpretability); some examples of research areas where this applies are e.g. scalable oversight, adversarial robustness, chain-of-thought faithfulness, process-based oversight, and model organisms of misalignment. The exact weighting will also vary heavily depending on what role you’re serving on the team/project. E.g., I’d probably upweight criteria where you’re differentially strong or differentially contributing on the team, since I generally guide people towards working on things that line up with their skills. For more junior collaborators (e.g., first time doing a research project, where I’ve scoped out the project), this means I generally weigh execution-focused criteria more than direction-setting criteria (since here I’m often the person doing the direction setting). Also, some of the criteria as outlined below are a really high bar, and e.g. I only recently started to meet them myself after 5 years of doing research and/or I don’t meet other criteria myself. This is mainly written to be a north star for targets to aim for. That said, I think most people can get to a good-to-great spot on these criteria with 6-18 months of trying, and I don’t currently think that many of these criteria are particularly talent/brains bottlenecked vs. just doing a lot of deliberate practice and working to get better on these criteria (I was actively bad at some of the criteria below like implementation speed even ~6 months into me doing research, but improved a lot since then with practice). With that context, here are the rough success criteria I’d outline:
Tactical Research Tips & Approach
For highly empirical research, it’s critical to get quick feedback and iterate on ideas rapidly. Jacob Steinhardt has a great blog post describing that a really good strategy for doing research is to “reduce uncertainty at the fastest possible rate” — with language model and alignment research, you can often reduce uncertainty really quickly, as little as a single message to GPT4 on your phone or Claude in Slack. This can be a huge win over e.g. launching a large training run or set of API calls to get back results, and means you can gain 1+ OOMs more information (per unit time) about what will work well, just by being careful to derisk ideas in the quickest way possible. (Most of the ideas in this doc are focused around this idea, and I’m not discussing project selection which is also important but orthogonal to the details below on how to do research.)
Workflow
Below, I’m including my workflow for “getting models to do something” as quickly as possible. It’s the general strategy I’ve used for prototyping ideas like generating evaluations with LMs, red teaming language models with language models, training language models with language feedback, etc. (If this workflow doesn’t work directly for some research task you’re doing, e.g. interpretability, then it’s at least an illustrative example of how to prioritize experiments in another setting.)
Try versions of an idea in this order (only skip a step if you have a very strong reason to do so):
Other practical tips:
Reading research papers:
General Mindset Tips
Three modes of research
(Most projects will involve some amount of each)
Work Habits
Machine Learning/Engineering Footguns
Default Norms for Projects with Me
Below are the default norms I follow for most external-to-Anthropic projects I supervise (might be helpful for both new researchers and new researcher mentors e.g. for SERI MATS). I think they're reasonable defaults (especially if you're not sure where to start project norms or project management -wise).
The working style here is particularly tailored towards junior collaborators, so if you’re a more experienced collaborator (e.g., 2+ first or co-first author ML papers under your belt), feel free to work in a more independent manner than described below. Any of these are up for discussion e.g. if someone on the project thinks a different way of working together would work better, relative to what's described below.