Partly in response to calls for more detailed accounts of how AI could go wrong, e.g., from Ng and Bengio's recent exchange on Twitter, here's a new paper with Stuart Russell:
Many of the ideas will not be new to LessWrong or the Alignment Forum, but holistically I hope the paper will make a good case to the world for using logically exhaustive arguments to identify risks (which, outside LessWrong, is often not assumed to be a valuable approach to thinking about risk).
I think the most important figure from the paper is this one:
Partly in response to calls for more detailed accounts of how AI could go wrong, e.g., from Ng and Bengio's recent exchange on Twitter, here's a new paper with Stuart Russell:
https://twitter.com/AndrewCritchCA/status/1668476943208169473
"TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI"
Many of the ideas will not be new to LessWrong or the Alignment Forum, but holistically I hope the paper will make a good case to the world for using logically exhaustive arguments to identify risks (which, outside LessWrong, is often not assumed to be a valuable approach to thinking about risk).
I think the most important figure from the paper is this one:
... and, here are some highlights:
https://arxiv.org/pdf/2306.06924.pdf#page=4
https://arxiv.org/pdf/2306.06924.pdf#page=5
...as in this "production web" story:
https://arxiv.org/pdf/2306.06924.pdf#page=6
https://arxiv.org/pdf/2306.06924.pdf#page=8
https://arxiv.org/pdf/2306.06924.pdf#page=10
https://arxiv.org/pdf/2306.06924.pdf#page=11
https://arxiv.org/pdf/2306.06924.pdf#page=12
https://arxiv.org/pdf/2306.06924.pdf#page=13
Enjoy :)