Open Philanthropy is launching a big new Request for Proposals for technical AI safety research, with plans to fund roughly $40M in grants over the next 5 months, and available funding for substantially more depending on application quality.
Applications (here) start with a simple 300 word expression of interest and are open until April 15, 2025.
Overview
We're seeking proposals across 21 different research areas, organized into five broad categories:
Adversarial Machine Learning
*Jailbreaks and unintentional misalignment
*Control evaluations
*Backdoors and other alignment stress tests
*Alternatives to adversarial training
Robust unlearning
Exploring sophisticated misbehavior of LLMs
*Experiments on alignment faking
*Encoded reasoning in CoT and inter-model communication
Black-box LLM psychology
Evaluating whether models can hide dangerous behaviors
Reward hacking of human oversight
Model transparency
Applications of white-box techniques
Activation monitoring
Finding feature representations
Toy models for interpretability
Externalizing reasoning
Interpretability benchmarks
More transparent architectures
Trust from first principles
White-box estimation of rare misbehavior
Theoretical study of inductive biases
Alternative approaches to mitigating AI risks
Conceptual clarity about risks from powerful AI
New moonshots for aligning superintelligence
We’re willing to make a range of types of grants including:
Research expenses (compute, APIs, etc.)
Discrete research projects (typically lasting 6-24 months)
Academic start-up packages
Support for existing nonprofits
Funding to start new research organizations or new teams at existing organizations.
The full RFP provides much more detail on each research area, including eligibility criteria, example projects, and nice-to-haves. You can find it here.
We want the bar to be low for submitting expressions of interest: even if you're unsure whether your project fits perfectly, we encourage you to submit an EOI. This RFP is partly an experiment to understand the demand for funding in AI safety research.
Open Philanthropy is launching a big new Request for Proposals for technical AI safety research, with plans to fund roughly $40M in grants over the next 5 months, and available funding for substantially more depending on application quality.
Applications (here) start with a simple 300 word expression of interest and are open until April 15, 2025.
Overview
We're seeking proposals across 21 different research areas, organized into five broad categories:
We’re willing to make a range of types of grants including:
The full RFP provides much more detail on each research area, including eligibility criteria, example projects, and nice-to-haves. You can find it here.
We want the bar to be low for submitting expressions of interest: even if you're unsure whether your project fits perfectly, we encourage you to submit an EOI. This RFP is partly an experiment to understand the demand for funding in AI safety research.
Please email aisafety@openphilanthropy.org with questions, or just submit an EOI.