I really enjoyed this sequence, it provides useful guidance on how to combine different sources of knowledge and intuitions to reason about future AI systems. Great resource on how to think about alignment for an ML audience.
I've just binge-read the entire sequence and thoroughly enjoyed it, thanks a lot for writing! I really like the framework of three anchors - thought experiments, humans, and empirical ML - and emphasising the strength and limitations of each and the need for all 3. Most discourse I've seen tends to strongly favour just one anchor, but 'all are worth paying attention to, none should be totalising' seems obviously true
Curated. This post cleanly gets at some core disagreement in the AI [Alignment] fields, and I think does so from a more accessible frame/perspective than other posts on LessWrong and the Alignment forum. I'm hopeful that this post and others in the sequence will enable better and more productive conversations between researchers, and for that matter, just better thoughts!
Thanks Ruby! Now that the other posts are out, would it be easy to forward-link them (by adding links to the italicized titles in the list at the end)?
We can also make a Sequence. I assume "More Is Different for AI" should be the title of the overall Sequence too?
Here it is! https://www.lesswrong.com/s/4aARF2ZoBpFZAhbbe
You might want to edit the description and header image.
Machine learning is touching increasingly many aspects of our society, and its effect will only continue to grow. Given this, I and many others care about risks from future ML systems and how to mitigate them.
When thinking about safety risks from ML, there are two common approaches, which I'll call the Engineering approach and the Philosophy approach:
I'll discuss these approaches mainly in the context of ML safety, but the same distinction applies in other areas. For instance, an Engineering approach to AI + Law might focus on how to regulate self-driving cars, while Philosophy might ask whether using AI in judicial decision-making could undermine liberal democracy.
While Engineering and Philosophy agree on some things, for the most part they make wildly different predictions both about what the key safety risks from ML will be and how we should address them:
In my experience, people who strongly subscribe to the Engineering worldview tend to think of Philosophy as fundamentally confused and ungrounded, while those who strongly subscribe to Philosophy think of most Engineering work as misguided and orthogonal (at best) to the long-term safety of ML. Given this sharp contrast and the importance of the problem, I've thought a lot about which—if either—is the "right" approach.
Coming in, I was mostly on the Engineering side, although I had more sympathy for Philosophy than the median ML researcher (who has ~0% sympathy for Philosophy). However, I now feel that:
On the other hand, I also feel that:
I've reached these conclusions through a combination of thinking, discussing with others, and observing empirical developments in ML since 2011 (when I entered the field). I've distilled my thoughts into a series of blog posts, where I'll argue that:
This post is the introduction to the series. I'll post the next part each Tuesday, and update this page with links once the post is up. In the meantime, leave comments with any thoughts you have, or contact me if you'd like to preview the upcoming posts and leave feedback.