Nice! Purely for my own ease of comprehension I'd have liked a little more translation/analogizing between AI jargon and HCI jargon - e.g. the phrase "active learning" doesn't appear in the post.
- Value Alignment: Ultimately, humans will likely need to continue to provide input to confirm that AI systems are indeed acting in accordance with human values. This is because human values continue to evolve. In fact, human values define a “slice” of data where humans are definitionally more accurate than non-humans (including AI). AI systems might get quite good at predicting what aligned behavior should be in out-of-distribution scenarios, but it’s unlikely that AI will be able to figure out what humans want in completely new situations without humans being consulted and kept in the loop.
I disagree in several ways.
By Sophie Bridgers, Rishub Jain, Rory Greig, and Rohin Shah
Based on work by the Google DeepMind Rater Assist Team: Vladimir Mikulik, Sophie Bridgers, Tian Huey Teh, Rishub Jain, Rory Greig, Lili Janzer (randomized order, equal contributions)
Human oversight is critical for ensuring that Artificial Intelligence (AI) models remain safe and aligned to human values. But AI systems are rapidly advancing in capabilities and are being used to complete ever more complex tasks, making it increasingly challenging for humans to verify AI outputs and provide high-quality feedback. How can we ensure that humans can continue to meaningfully evaluate AI performance? An avenue of research to tackle this problem is “Amplified Oversight” (also called “Scalable Oversight”), which aims to develop techniques to use AI to amplify humans’ abilities to oversee increasingly powerful AI systems, even if they eventually surpass human capabilities in particular domains.
With this level of advanced AI, we could use AI itself to evaluate other AIs (i.e., AI raters), but this comes with drawbacks (see Section IV: The Elephant in the Room). Importantly, humans and AIs have complementary strengths and weaknesses. We should thus, in principle, be able to leverage these complementary abilities to generate an oversight signal for model training, evaluation, and monitoring that is stronger than what we could get from human raters or AI raters alone. Two promising mechanisms for harnessing human-AI complementarity to improve oversight are:
The design of Rater Assistance and/or Hybridization protocols that enable human-AI complementarity is challenging. It requires grappling with complex questions such as how to pinpoint the unique skills and knowledge that humans or AIs possess, how to identify when AI or human judgment is more reliable, and how to effectively use AI to improve human reasoning and decision-making without leading to under- or over-reliance on the AI. These are fundamentally questions of Human-Computer Interaction (HCI), Cognitive Science, Psychology, Philosophy, and Education. Luckily, these fields have explored these same or related questions, and AI safety can learn from and collaborate with them to address these sociotechnical challenges. On our team, we have worked to expand our interdisciplinary expertise to make progress on Rater Assistance and Hybridization for Amplified Oversight.
Read the rest of the full blog here!