In every scenario, if you have a superintelligent actor which is optimizing the grader's evaluations while searching over a large real-world plan space, the grader gets exploited.
Similar to the evaluator-child who's trying to win his mom's approval by being close to the gym teacher, how would grader exploitation be different from specification gaming / reward hacking? In theory, wouldn't a perfect grader solve the problem?
I got the book (thanks to Conjecture) after doing the Intro to ML Safety Course where the book was recommended. I then browsed through the book and thought of writing a review of it - and I found this post instead, which is a much better review than I would have written, so thanks a lot for this!
Let me just put down a few thoughts that might be relevant for someone else considering picking up this book.
Target audience: Right at the beginning of the book, the author says "This book is written for the sophisticated practitioner rather than the academic...
Thanks for the comment!
You can read more about how these technical problems relate to AGI failure modes and how they rank on importance, tractability, and crowdedness in Pragmatic AI Safety 5. I think the creators included this content in a separate forum post for a reason.
I felt some of the content in the PAIS series would've been great for the course, though the creators probably had a reason to exclude them and I'm not sure why.
The second group doesn't necessarily care about why each research direction relates to reducing X-risk.
In this case I fee...
I feel like I broadly agree with most of the points you make, but I also feel like accident vs misuse are still useful concepts to have.
For example, disasters caused by guns could be seen as:
Nevertheless, all of the above ... (read more)