I'm especially interested in the analogy between AI alignment and democracy. (I guess this goes under "Social Structures and Institutions".) Democracy is supposed to align a superhuman entity with the will of the people, but there are a lot of failures, closely analogous to well-known AI alignment issues:
I think it's more likely that insights will transfer from the field of AI alignment to the field of government design than vice versa. Easier to do experiments on the AI side, and clearer thinkers.
I'm especially interested in the analogy between AI alignment and democracy.
This is indeed a productive analogy. Sadly, on this forum, this analogy is used in 99% of the cases to generate AI alignment failure mode stories, whereas I am much more interested in using it to generate useful ideas about AI safety mechanisms.
You may be interested in my recent paper 'demanding and designing', just announced here, where I show how to do the useful idea generating thing. I transfer some insights about aligning powerful governments and companies to the problem of aligning powerful AI.
Cross-posted to EA Forum
Introduction
How would you go about scientifically studying aliens? Arik Kershenbaum’s The Zoologist Guide to The Galaxy proposes to use evolutionary thinking to uncover constraints on how alien species could evolve. One of his most interesting points is that evolution constrains function far more than form, because function depends significantly less on the details of the environment. Hence we should expect crisper answers to “How would aliens behave?” than “What would aliens look like?”. And in the course of his book, he gives the best answer he can find to the former question.
So when confronted with the question of how to study something he couldn’t gather data on, Kershenbaum leveraged analogies to biological systems he could and had studied, and the underlying constraints brought on by the mechanisms of natural selection.
On a completely unrelated note, the new summer fellowship Principles of Intelligent Behavior in Biological and Social Systems (PIBBSS) (funded by the LTFF) aims at creating valuable AI alignment research through studying analogies to many complex systems (evolution, brains, language, social structures…). Fellows will have graduate research experience in fields studying such systems, working on a concrete alignment project in collaboration with an established alignment researcher. The fellowship will run during all of Summer 2022.
The point of this post is to introduce this fellowship, explain the reasoning behind it and give more concrete details about how it will go. Note that I’m not an organizer of this fellowship, I’m just assisting with the writing of this post; credits for the ideas and arguments should go to Nora Ammann and TJ, the organizers of the fellowship.
Analogies as General Epistemic Strategies for Alignment
As I’ve written elsewhere, alignment cannot directly leverage most epistemic strategies and approaches used in Science and Engineering, because it’s about solving a problem that doesn’t exist yet on a technology we still have to invent.
One epistemic strategy that survives this major problem is the leveraging of analogies with existing biological or social systems that implement complex or intelligent behavior. Consider how a wide variety of such systems (from biology, physics, linguistics and other fields) exhibit similar properties: adaptation, robustness, goal-directed behavior, learning, embeddedness, modularity, phase transitions, and more. Since AI research focuses on mechanisms that lead to complex and intelligent behavior with many of these properties, careful analogies with these complex systems may allow us to transfer knowledge about all these behaviors and properties to the study of alignment and AGI.
Also note that the other main epistemic strategy used in alignment, figuring things out from first principles, can and often does take inspiration from other existing systems like evolution, brains and languages.
Examples of Successful Analogies in Alignment
If analogical thinking is such a valuable epistemic approach to alignment, we should find ample examples of valuable alignment research using such analogies. And that’s indeed what we see.
Of course, none of these works completely exploit the analogy, nor do they encompass all analogies relevant for alignment. The previous list just serves to illustrate that analogical thinking is an integral part of many examples of current alignment research.
The Problem: Difficulty of Epistemic Translation
If analogies to other systems already abound in alignment, what is the point of the fellowship?
Here, the concept of epistemic translation, as discussed by Nora Ammann, might give us a better idea of what it takes to make fruitful analogies. Fundamentally, linking system A with system B requires the creation of a translation between the two, a bridge faithful enough to let us transform insights about system B into ones about system A (for example, the analogy between magic and pizza fails this condition).
Exploiting analogies faithfully thus involves:
Currently, alignment researchers are supposed to do all of that. Including building a deep expertise in the field they’re drawing from. With the exception of Steve Byrnes who did basically learn neuroscience for his research, most people don’t have the time to do that. As a consequence, they don’t find all valuable analogies for their work, or let them die in a drawer, or, if they do find and explore them, they might do so insufficiently or badly.
On the other hand, most people in fields that are ripe for analogies with alignment don’t know about the latter and don’t have any incentive to work on it. It’s also hard to get up to speed on alignment, especially when coming from a field outside computer science.
So alignment researchers want more varied and detailed analogies, and experts in other fields have the ability to help with these analogies and provide tools for studying the systems in question (and some could be interested by alignment as a challenging problem or a cause), but the current incentives and constraints makes it hard for the two sides to interface.
The PIBBSS Fellowship exists to bridge this gap by providing an institutional context for these collaborations.
Proposed Solution: Creating Institutional Context for Collaborations around Analogies
For the fellowship, alignment researchers get to propose projects related to analogies with biological or social systems that they would like to explore. Fellows with expertise in the corresponding field receive funding for the duration of the fellowship (12 weeks in Summer 2022) to collaborate with the alignment researcher on exploring the analogy and what it can bring to alignment.
Which fields are most promising? Well, fields don’t seem like the right granularity to discuss promising analogies here. Instead, the complex systems which are presented as analogical fit better (and they can be studied from different angles by different fields). Recall that for epistemic translation and analogies to be useful, there need to be insights, concepts and epistemic strategies for the analogous complex system to transfer over. So the systems most ripe for this work have been studied enough to gather a long tradition of results, appropriate epistemic hygiene, and, relative to the expected density of insight, they’re insufficiently represented in alignment.
This leads us to a tentative list that currently includes:
That said, the fellowship is open to other complex systems and fields that may have been overlooked at the moment, but share the properties that we care about (ie. insightfulness in existing literature and associated community, and relevance to AI alignment). In practice, the evaluation of the promisingness of a given analogy happens at the level of specific project proposals more so than at the level of entire disciplines.
Lastly, in some cases, the fellowship is open to epistemic transfer towards topic areas that do not fall under AI safety or governance, narrowly constructed. Examples include relevant topics on digital and emulated minds, advanced institutional design and collective intelligence, and industrial and scientific automation and progress.
Pre-mortem: what could go wrong?
This is a nice story, but let’s ask ourselves the important question: how could it fail?
First, even if new analogies result from this fellowship, there is a risk that they are shallow, at best useless and at worst confusing. An example of a condemnation of a class of such analogies is Yudkowsky’s criticism of biology-based timelines.
Proposed solution: In part, this will come from having experts of the other field provide enough details to reveal the shallowness of the analogy. And in cases where the core of the issues comes from understanding the mechanisms behind AGI and how it will appear, the alignment researchers involved should be able to catch it eventually. The mentor-fellow pair thus represent the first line of defense against epistemic pollution, and the wider epistemic communities in which they are embedded provide further source of feedback and scrutiny. All in all, the focus on analogies and their non-shallowness in this fellowship should increase the scrutiny enough to catch most of the shallow proposals.
Another issue comes from the difficulty of distinguishing valuable/deep analogies from useless/shallow analogies at a glance, before investing a lot of work on it, potentially wasting time.
Proposed solution:
The fellowship addresses this problem by letting (epistemic) demand drive (epistemic) supply. Concretely, this means that alignment researchers (and not fellows with expertise in other fields) propose projects according to how valuable they expect them to be. Thus the current proxy of the expected value of a given analogy is whether or not a given alignment researcher is sufficiently excited about a project to want to invest time in mentoring it.
The time constraints of the fellowship also privilege exploration, which is the main way to find out about the value of each research direction. Only a small fraction of projects needs to turn into fruitful research agendas to make up for many failed attempts.
Lastly, maybe interdisciplinary research between alignment researchers and experts from other fields is just too hard and fraught with miscommunication to work in most cases.
Interdisciplinary research is hard, and so is doing good research in general. The purpose of the fellowship is to find out more about these potential issues and solve them as well as possible.
Details of the Fellowship Program
From the website of the fellowship
Appendix: Sample of project proposals
The below sample of project proposals is meant to give readers a taste for the types of projects PIBBSS is hoping to facilitate.
Biodiversity and Heterogeneity in Energy Flows
Source Domain: Systems Ecology
Topic Summary:
A commonly discussed puzzle in ecology is related to the latitudinal distribution of biodiversity. A number of scholars have proposed that this is related to metabolism and the amount of energy flowing through the ecosystem. (Brown, James H., Why are there so many species in the tropics?. Journal of Biogeography 2013) An additional observation we might make is that in energy-rich ecosystems, such as tropical rainforests, where we encounter higher biodiversity, we also find a large number of organisms engaging in relatively simpler forms of energy consumption. Whereas, in energy-scarce ecosystems, there are fewer species and several organisms amongst them exhibit relatively more general intelligence in terms of their ability to source food and energy.
There has been a debate in the last few years regarding whether we should anticipate artificial agents with general intelligence or ecosystems of specialized services. To inform this debate, we want to understand:
[h/t Jan Kulveit]
Basins of Robustness in Search Spaces
Source Domain: Evolutionary Biology
Topic Summary:
Within evolutionary theory, there are two approaches to explaining robustness observed in biological systems. The first is that random search is likely to find basins of robustness simply because such basins occupy significant probability mass. The second approach argues that robustness is selected for by evolution as a response to mutations and environmental perturbations. (Wagner, A., Robustness and Evolvability in Living Systems. Princeton University Press 2005).
Better understanding of the relative causal roles played by these phenomena can help us in building better models for the study of robustness and corrigibility in AI.
For example, we may ask:
[h/t TJ]
Institutional foundations of Linguistic Innovation
Source Domain: Sociolinguistics
Topic Summary:
Language, its evolution, and its current usage within society might limit which novel concepts can be acquired and become broadly recognized. Participants in a linguistic community experience agency to use language in innovative ways (‘creativity’ in Chomsky 1965), and therefore also exert influence over how linguistic affordances (concepts, vocabularies, etc.) evolve over time. Often such creativity is also built on top of existing morphological and lexical resources (‘productivity’ in Hockett 1958, Bauer 2001). (Also see: Expanding the Lexicon, eds. S Arndt-Lappe et al. 2018)
These forms of linguistic innovation and evolution, however, are also balanced by evolutionary pressures that help maintain reasonable levels of lexical and semantic stability in the language, allowing language to be useful for coordination. The generation, diffusion and autoregulation of linguistic innovation can therefore also be seen as being mediated by cultural and institutional factors. By better understanding the different factors that shape linguistic innovation and evolution, we can both: a) better reflect on the role played by differential deployment of Large Language Models (LLMs) in the near-term, as well as, b) better understand which of these dynamics can be extrapolated for understanding linguistic and cognitive competencies of future AI systems. Some specific questions of interest might include:
Social learning and the limitations of the RL framework
Source Domain: Cognitive science
Topic Summary:
Reinforcement learning is the dominant framework at the moment in the psychology and neuroscience of human and animal behavior. In thinking about digital minds, that is convenient because, if it's true that humans and animals are basically reinforcement learners, it follows that artificial RL systems are (in some sense) things of basically the same kind. It also seems to influence thinking about agency and motivation in the alignment space.
An interesting question to us is thus: what are the limitations of the RL framework for explaining human behavior? In particular, there exists preliminary evidence (Ho et al, 2017) for a limitation of RL in the area of social learning from evaluative feedback, which would seem particularly relevant to alignment.
[h/t Patrick Butlin]