Partly in response to calls for more detailed accounts of how AI could go wrong, e.g., from Ng and Bengio's recent exchange on Twitter, here's a new paper with Stuart Russell:

Many of the ideas will not be new to LessWrong or the Alignment Forum, but holistically I hope the paper will make a good case to the world for using logically exhaustive arguments to identify risks (which, outside LessWrong, is often not assumed to be a valuable approach to thinking about risk).

I think the most important figure from the paper is this one:

... and, here are some highlights:

Enjoy :)

New Comment