Thanks for this post, Paul!
NOTE: Response to this post has been even greater than we expected. We received more applications for experiment participant than we currently have the capacity to manage so we are temporarily taking the posting down. If you've applied and don't hear from us for a while, please excuse the delay! Thanks everyone who has expressed interest - we're hoping to get back to you and work with you soon.
I think that Ought is one of the most promising projects working on AI alignment. There are several ways that LW readers can potentially help:
In this post I'll describe what Ought is currently doing, why I think it's promising, and give some detail on these asks.
(I am an Ought donor and board member.)
Factored evaluation
Ought's main project is currently designing and running "factored evaluation" experiments, and building relevant infrastructure. The goal of these experiments is to answer the following question:
Here's what an experiment looks like:
This is not Ought's only project, but it's currently the largest single focus. Other projects include: exploring how well we can automate the judge's role on simple questions using existing ML, and thinking about possible decomposition strategies and challenges for factored evaluation.
Why this is important for AI alignment
ML systems are trained by gradient descent to optimize a measurable objective. In the best case (i.e. ignoring misaligned learned optimization) they behave like an expert incentivized to optimize that objective. Designing an objective that incentivizes experts to reveal what they know seems like a critical step in AI alignment. I think human experts are often a useful analogy for powerful ML systems, and that we should be using that analogy as much as we can.
Not coincidentally, factored evaluation is a major component of my current best-guess about how to address AI alignment, which could literally involve training AI systems to replace humans in Ought's current experiments. I'd like to be at the point where factored evaluation experiments are working well at scale before we have ML systems powerful enough to participate in them. And along the way I expect to learn enough to substantially revise the scheme (or totally reject it), reducing the need for trials in the future when there is less room for error.
Beyond AI alignment, it currently seems much easier to delegate work if we get immediate feedback about the quality of output. For example, it's easier to get someone to run a conference that will get a high approval rating, than to run a conference that will help participants figure out how to get what they actually want. I'm more confident that this is a real problem than that our current understanding of AI alignment is correct. Even if factored evaluation does not end up being critical for AI alignment I think it would likely improve the capability of AI systems that help humanity cope with long-term challenges, relative to AI systems that help design new technologies or manipulate humans. I think this kind of differential progress is important.
Beyond AI, I think that having a clearer understanding of how to delegate hard open-ended problems would be a good thing for society, and it seems worthwhile to have a modest group working on the relatively clean problem "can we find a scalable approach to delegation?" It wouldn't be my highest priority if not for the relevance to AI, but I would still think Ought is attacking a natural and important question.
Ways to help
Web developer
I think this is likely to be the most impactful way for someone with significant web development experience to contribute to AI alignment right now. Here is the description from their job posting:
Experiment participants
Ought is looking for contractors to act as judges, honest experts, and malicious experts in their factored evaluation experiments. I think that having competent people doing this work makes it significantly easier for Ought to scale up faster and improves the probability that experiments go well---my rough guess is that a very competent and aligned contractor working for an hour does about as much good as someone donating $25-50 to Ought (in addition to the $25 wage).
Here is the description from their posting:
Donate
I think Ought is probably the best current opportunity to turn marginal $ into more AI safety, and it's the main AI safety project I donate to. You can donate here.
They are spending around $1M/year. Their past work has been some combination of: building tools and capacity, hiring, a sequence of exploratory projects, charting the space of possible approaches and figuring out what they should be working on. You can read their 2018H2 update here.
They have recently started to scale up experiments on factored evaluation (while continuing to think about prioritization, build capacity, etc.). I've been happy with their approach to exploratory stages, and I'm tentatively excited about their approach to execution.