Research automation is here (1, 2, 3). We saw it coming and planned ahead, which puts us ahead of most (4, 5, 6). But that foresight also comes with a set of outdated expectations that are holding us back. In particular, research automation is not just about “aligning the first AI scientist”, it’s also about the institution-building problem of coordinating the first AI research fleets.
Research automation is not about developing a plug-and-play “AI scientist”. Transformative technologies are rarely straightforward substitutes for what came before. The industrial revolution was not about creating mechanical craftsmen but about deconstructing craftsmen into assembly lines of specialized, repeatable tasks. Algorithmic trading was not just about creating faster digital traders but about reimagining traders as fleets of bots, quants, engineers, and other specialists. AI-augmented science will not just be about creating AI “scientists.”
Why? New technologies come with new capabilities and limitations. To fully take advantage of the benefits, we have to reshape our workflows around these new limitations. This means that even if AIs eventually surpass human abilities across the board, roles like “researcher” will likely transform dramatically during the transition period.
The bottleneck to automation is not just technological but also institutional. The problem of research automation is not just about training sufficiently capable and aligned models. We face an “institutional overhang” where AI capabilities are outpacing our ability to effectively organize around their weaknesses. Factories had to develop new management techniques, quality control systems, and worker training programs to make assembly lines effective. Trading firms had to build new risk management frameworks, compliance systems, and engineering cultures to succeed at algorithmic trading. So too, research institutions will need to reinvent themselves around AI or fall behind.
The scaling labs have already moved beyond the traditional academic model. Consider the use of matrix management structures where research engineers work across multiple projects, standardized research workflows that enable fast iteration, and cross-cutting infrastructure teams that maintain the computational foundation for research. Labs employ specialized roles like research engineers, infrastructure specialists, and research managers that don't fit neatly into the academic hierarchy.
Deepmind’s recent Nobel prize is a hint of more to come.
A vision: the automated research fleet. Imagine tomorrow’s research lab: not individual AI models confined to chat windows but vast digital fleets of specialized AI agents working in concert. Each agent masters its own niche in the research pipeline: proving theorems, reviewing literature, generating hypotheses, running experiments, analyzing results, communicating outcomes, developing new techniques, conceptualizing entirely new paradigms…
Automation raises the level of abstraction so that everyone becomes a middle manager — every researcher the director of a research institution of their own. And it changes the basic patterns of human-AI interaction: the prompter will become the prompted — instead of crafting careful prompts in chat interfaces, human researchers receive updates and requests for guidance from their AI project leads, who independently pursue established research objectives.
This future may appear wasteful at first glance. Imagine thousands of AI instances running in parallel, testing slight variations of the same approach, with almost all attempts failing. Or hundreds of different AI instances in a shared chat that redundantly process the same tokens. But this apparent inefficiency is a feature, not a bug. Ford’s assembly lines overproduced standardized parts; McLean’s containers shipped half-empty; early cloud computing wasted countless unused FLOPs. Just as these “inefficiencies” enabled unprecedented flexibility and scale in their industries, the parallel processing power of AI research fleets will unlock new possibilities in scientific discovery. The ability to rapidly test hundreds of variations, explore multiple paths simultaneously, and fail fast will become a cornerstone of future research methodology.
The scaling labs already understand that research automation is here – they're building the infrastructure and organizational patterns for automated research at scale. For AI safety to stay relevant, we need to adapt and accelerate. Here are our recommendations for transitioning toward AI research fleet management:
Play around with the tools: Copilot, Cursor[1], o1 pro, Gemini pro, Perplexity, Elicit, etc. Different LLMs have different styles, which you can get a finger-tip feel for when you work with them a lot. Being playful will help you avoid the trap of dismissing them too soon.
Beware AI slop. We are not pollyanish AI enthusiasts — much of the content currently produced by AI is bad and possibly harmful. Continue to whet your tastes on pre-2023 human-sourced content.
Beware AI slop. You shouldn’t use AI systems blindly for all of your coding and research. At the same time, you should tolerate early automation mistakes (from, e.g., AI code slop) as learning opportunities for your organization to develop better quality control processes.
In general, we recommend working forwards from your existing workflows rather than working backwards from any idealistic vision of what automated AI safety research should look like. Too much theorizing is a real risk. Work iteratively with what you have.
We personally are starting today, and think you should too. The race for AI safety isn't one we chose, but it's one we have to win.
Thanks to Raemon and Daniel Murfet for feedback on a draft of this post.
Algorithmic trading
MacKenzie, D. (2021). "Trading at the Speed of Light: How Ultrafast Algorithms Are Transforming Financial Markets." Princeton University Press.
MacKenzie, D. (2019). "How Algorithms Interact: Goffman's 'Interaction Order' in Automated Trading." Theory, Culture & Society 36(2): 39-59.
Zuckerman, G. (2019). “The Man who Solved the Market: How Jim Simons Launched the Quant Revolution”. New York, NY, Portfolio / Penguin.
Industrial research, big pharma, biotech research, defense & national laboratory research
Hounshell, D.A. and Smith, J.K. (1988). "Science and Corporate Strategy: Du Pont R&D, 1902-1980." Cambridge University Press.
Henderson, R. (1994). "Managing Innovation in the Information Age." Harvard Business Review 72(1): 100-105.
Quality control in flexible manufacturing systems
Hayes, R. H., & Jaikumar, R. (1988). "Manufacturing's crisis: New technologies, obsolete organizations." Harvard Business Review, 66(5), 77-85.
Goldratt, Eliyahu, (1984). “The Goal: a Process of Ongoing Improvement". Great Barrington, MA :North River Press
Medical & legal automation
Jha, S. and Topol, E. (2016). "Adapting to Artificial Intelligence: Radiologists and Pathologists as Information Specialists." JAMA 316(22): 2353-2354.
Remus, D. and Levy, F. (2017). "Can Robots Be Lawyers? Computers, Lawyers, and the Practice of Law." Georgetown Journal of Legal Ethics 30: 501-558.