Provide feedback on Open Philanthropy’s AI alignment RFP

abergal; Nick_Beckstead

Open Philanthropy is planning a request for proposals (RFP) for AI alignment projects working with deep learning systems, and we’re looking for feedback on the RFP and on the research directions we’re looking for proposals within. We’d be really interested in feedback from people on the Alignment Forum on the current (incomplete) draft of the RFP.

The main RFP text can be viewed here. It links to several documents describing two of the research directions we’re interested in:

Measuring and forecasting risks
Techniques for enhancing human feedback [Edit: this previously linked to an older, incorrect version]

Please feel free to comment either directly on the documents, or in the comments section below.

We are unlikely to add or remove research directions at this stage, but we are open to making any other changes, including to the structure of the RFP. We’d be especially interested in getting the Alignment Forum’s feedback on the research directions we present, and on the presentation of our broader views on AI alignment. It’s important to us that our writing about AI alignment is accurate and easy to understand, and that it’s clear how the research we’re proposing relates to our goals of reducing risks from power-seeking systems.

The implication seems to be that this RFP is for AIS work that is especially focused on DL systems. Is there likely to be a future RFP for AIS research that applies equally well to DL and non-DL systems? Regardless of where my research lands, I imagine a lot of useful and underfunded research fits in the latter category.

This RFP is an experiment for us, and we don't yet know if we'll be doing more of them in the future. I think we'd be open to including research directions we think that are promising that apply equally well to both DL and non-DL systems-- I'd be interested in hearing any particular suggestions you have.

(We'd also be happy to fund particular proposals in the research directions we've already listed that apply to both DL and non-DL systems, though we will be evaluating them on how well they address the DL-focused challenges we've presented.)

I imagine you could catch useful work with i) models of AI safety, or ii) analysis of failure modes, or something, though I'm obviously biased here.

Thank you for posting this Asya and Nick. After I read it I realized that it connected to something that I've been thinking about for a while that seems like it might actually be a fit for this RFP under research direction 3 or 4 (interpretability, truthful AI). I drafted a very rough 1.5-pager this morning in a way that hopefully connects fairly obviously to what you've written above:

https://docs.google.com/document/d/1pEOXIIjEvG8EARHgoxxI54hfII2qfJpKxCqUeqNvb3Q/edit?usp=sharing

Interested in your thoughts.

Feedback from everyone is most welcome, too, of course.

Great initiative! I'll try to leave some comments sometime next week.

Is there a deadline? (I've seen floating around the 15th of September, but I guess feedback would be valuable before that so you can take it into account?)

Also, is this the proposal mentioned by Rohin in his last newsletter, or a parallel effort?

Getting feedback in the next week would be ideal; September 15th will probably be too late.

Different request for proposals!