Building AI Research Fleets

Ben Goldhaber; Jesse Hoogland

From AI scientist to AI research fleet

Research automation is here (1, 2, 3). We saw it coming and planned ahead, which puts us ahead of most (4, 5, 6). But that foresight also comes with a set of outdated expectations that are holding us back. In particular, research automation is not just about “aligning the first AI scientist”, it’s also about the institution-building problem of coordinating the first AI research fleets.

Research automation is not about developing a plug-and-play “AI scientist”. Transformative technologies are rarely straightforward substitutes for what came before. The industrial revolution was not about creating mechanical craftsmen but about deconstructing craftsmen into assembly lines of specialized, repeatable tasks. Algorithmic trading was not just about creating faster digital traders but about reimagining traders as fleets of bots, quants, engineers, and other specialists. AI-augmented science will not just be about creating AI “scientists.”

Why? New technologies come with new capabilities and limitations. To fully take advantage of the benefits, we have to reshape our workflows around these new limitations. This means that even if AIs eventually surpass human abilities across the board, roles like “researcher” will likely transform dramatically during the transition period.

The bottleneck to automation is not just technological but also institutional. The problem of research automation is not just about training sufficiently capable and aligned models. We face an “institutional overhang” where AI capabilities are outpacing our ability to effectively organize around their weaknesses. Factories had to develop new management techniques, quality control systems, and worker training programs to make assembly lines effective. Trading firms had to build new risk management frameworks, compliance systems, and engineering cultures to succeed at algorithmic trading. So too, research institutions will need to reinvent themselves around AI or fall behind.

The scaling labs have already moved beyond the traditional academic model. Consider the use of matrix management structures where research engineers work across multiple projects, standardized research workflows that enable fast iteration, and cross-cutting infrastructure teams that maintain the computational foundation for research. Labs employ specialized roles like research engineers, infrastructure specialists, and research managers that don't fit neatly into the academic hierarchy.

Deepmind’s recent Nobel prize is a hint of more to come.

A vision: the automated research fleet. Imagine tomorrow’s research lab: not individual AI models confined to chat windows but vast digital fleets of specialized AI agents working in concert. Each agent masters its own niche in the research pipeline: proving theorems, reviewing literature, generating hypotheses, running experiments, analyzing results, communicating outcomes, developing new techniques, conceptualizing entirely new paradigms…

Automation raises the level of abstraction so that everyone becomes a middle manager — every researcher the director of a research institution of their own. And it changes the basic patterns of human-AI interaction: the prompter will become the prompted — instead of crafting careful prompts in chat interfaces, human researchers receive updates and requests for guidance from their AI project leads, who independently pursue established research objectives.

This future may appear wasteful at first glance. Imagine thousands of AI instances running in parallel, testing slight variations of the same approach, with almost all attempts failing. Or hundreds of different AI instances in a shared chat that redundantly process the same tokens. But this apparent inefficiency is a feature, not a bug. Ford’s assembly lines overproduced standardized parts; McLean’s containers shipped half-empty; early cloud computing wasted countless unused FLOPs. Just as these “inefficiencies” enabled unprecedented flexibility and scale in their industries, the parallel processing power of AI research fleets will unlock new possibilities in scientific discovery. The ability to rapidly test hundreds of variations, explore multiple paths simultaneously, and fail fast will become a cornerstone of future research methodology.

Recommendations

The scaling labs already understand that research automation is here – they're building the infrastructure and organizational patterns for automated research at scale. For AI safety to stay relevant, we need to adapt and accelerate. Here are our recommendations for transitioning toward AI research fleet management:

Individual practices

Spend time on research automation each week: Embrace the lazy programmer mindset of over-automation. Research relevant tasks can be automated now, and it will instill the habit of looking for potential gains from AI+human automation.
Play around with the tools: Copilot, Cursor^[1], o1 pro, Gemini pro, Perplexity, Elicit, etc. Different LLMs have different styles, which you can get a finger-tip feel for when you work with them a lot. Being playful will help you avoid the trap of dismissing them too soon.

Beware AI slop. We are not pollyanish AI enthusiasts — much of the content currently produced by AI is bad and possibly harmful. Continue to whet your tastes on pre-2023 human-sourced content.

Organizational changes^[2]

Invest in documentation: LLM tooling is most helpful when you can provide rich context. Create good, up-to-date documentation on company projects to maximize the help that current tools can provide, and to lay the infrastructure for the future. More generally, consider migrating to monorepos and single sprawling google docs to make it easier for your AI systems to load in the necessary context.
Adopt team and organizational norms of experimentation: Set a north star for your research team and organization to experiment with increased use of AI agent workflows. Appoint someone as end-responsible for automation (in your infrastructure or devops team).

Beware AI slop. You shouldn’t use AI systems blindly for all of your coding and research. At the same time, you should tolerate early automation mistakes (from, e.g., AI code slop) as learning opportunities for your organization to develop better quality control processes.

Community-level actions

Develop more case studies and research: Though research automation will be different from past waves of automation, we can take a lesson from historical examples of automation. Below, we’ve included stubs to a few examples and encourage gathering primary sources and interviews from practitioners during these transition periods.
Share results from individual, or ideally team, experiments: We expect there to be a lot of different “organizational design patterns” for research automation. It will be difficult and counterproductive for any one team to work through all of them, but sharing techniques for this type of experimental research will benefit the collective.
Establish high-signal groups/meetups/conferences with a research organization focus: We encourage bringing together groups of researchers who are interested in and experimenting with research automation. There’s a tremendous amount of noise in the space; trusted groups can act as necessary filters for separating practical, evidence-based approaches from less substantiated claims. At the same time, we should cast a wide net and learn from how non-AI-safety organizations are adapting to AI.
Outline visions and “sci-fi” futures of research fleet management: We’ve outlined one possible vision, but we expect there are to be far more, and we expect vision papers/posts/tweets to help clarify the direction that we need to steer towards.

In general, we recommend working forwards from your existing workflows rather than working backwards from any idealistic vision of what automated AI safety research should look like. Too much theorizing is a real risk. Work iteratively with what you have.

We personally are starting today, and think you should too. The race for AI safety isn't one we chose, but it's one we have to win.

Thanks to Raemon and Daniel Murfet for feedback on a draft of this post.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

46

Building AI Research Fleets

46

From AI scientist to AI research fleet

Recommendations

Individual practices

Organizational changes^[2]

Community-level actions

Further Reading

On Automation in AI Safety

On Research Automation

On Automation Generally

46

Building AI Research Fleets

46

From AI scientist to AI research fleet

Recommendations

Individual practices

Organizational changes[2]

Community-level actions

Further Reading

On Automation in AI Safety

On Research Automation

On Automation Generally

Organizational changes^[2]