Is there any good AI alignment research that you don't classify as deconfusion? If so, can you give some examples?
Sure.
All in all, I think there are many more examples. It's just that deconfusion almost always plays a part, because we don't have one unified paradigm or approach which does the deconfusion for us. But actual problem solving and most part of normal science, are not deconfusion by my perspective.
Introduction
My PhD thesis probably wins the prize of weirdest ever defended at my research lab. Not only was it a work of theory of distributed computing in a formal methods lab, but it didn’t even conform to what theory of distributed computing is supposed to look like. With the exception of one paper (interestingly, the only one accepted at the most prestigious conference in the field), none of my published works proposed new algorithms, impossibility results, complexity lower bounds, or even the most popular paper material, a brand new model of distributed computing to crowd even more the literature.
Instead, I looked at a specific formalism introduced years before, and how it abstracted the more familiar models used by most researchers. It had been introduced as such an abstraction, but the actual formal connection was missing. What I did was put it on a more formal footing, and also explain what was lost in abstraction.
It’s not that it wasn’t interesting. My advisors were curious if not impressed. I published multiple papers. But it was quite clear that no one really knew what to do with my work, be it reviewers, colleagues, advisors, or even me.
We didn’t know I was doing deconfusion.
You might have heard of it: Nate Soares, MIRI’s executive director, coined the term to describe the sort of work MIRI was and is doing; things like Functional Decision Theory and Logical Induction. He described it as:
something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense.”
In hindsight, this quote captures both my graduate work and my current research in AI Alignment. Yet I fail to realize this despite reading the post multiple times in the last few years. Deconfusion sounded either like fundamental research or like distillation; two subjects I was interested in, but which didn’t quite fit my interests and my intuitions for the thing I was aiming at.
Similarly, I see many people use deconfusion to reference those two topics. Fundamental research because deconfusion is used to point at what MIRI is doing, and they have been traditionally focused on deconfusion related to fundamental questions of agency of rationality. Distillation because it involves dissolving one form of confusion -- the confusion coming from inferential distance.
So anyone who wants to do actual new research and doesn’t subscribe to a vision of alignment requiring a complete formalization has nothing to do with deconfusion, right?
I disagree. I see deconfusion as a type of research that isn’t limited to fundamental research or distillation, but also plays a fundamental role in prosaic alignment approaches and more concrete research. Moreover, deconfusion as I see it seems sorely needed in the field right now to deal with the lack of paradigm and the difficulty of stating scenarios, risks, problems and solutions clearly.
I thus propose my take on deconfusion, which offers a grounding to clarify how deconfusion can be useful. I then present an analogy between my proposition of deconfusion and programming, address why I’m not using the name of conceptual engineering instead, and finally spell out some consequences of this perspective for the field and researchers.
Note that even if I believe this is a valuable way of viewing deconfusion, it hasn’t paid enough rent yet to be taken as a definition of deconfusion per se. But I have good hopes that the further work I’m doing currently will pile some evidence for this.
Special Thanks to Nate Soares for a great discussion on exactly what he meant in his original blog post; to John S. Wentworth for multiple discussions on the nature of deconfusion, and for pushing me to go deeper; to Abram Demski for rightly asking more details on confusion and language building, as well as for suggesting the great analogy to programming and refactoring; and to Connor Leahy for a valuable conversation and helping me present the application part in a non-confusing way. Thanks to Alexis Carlier, Edouard Harris and Alex Turner for feedback on the ideas.
My take on deconfusion
Here is how I think about deconfusion:
the process of dissolving confusion, by reducing the object of thought to less confused and better understood ideas, with an application in mind.
In and of itself, this doesn’t help much; it’s just a nice sounding sentence. But by going into more detail about each part of it, I hope to convince you that it already clarifies quite a lot about deconfusion.
Always an application
In my take, there is the controversial word application. Coming from a theoretical field, I know how many researchers fear this word. Not that they’re against applications in and of themselves, but they are tired of having to endlessly “justify” theory research through far-and-farer fetched appeals to the fashionable application (for example, my thesis funding application mentioned IOT…). Indeed, some early readers of this work complained that some deconfusion doesn’t have an application.
The answer is simply that everything can count as an application here. I’m not adding a constraint on deconfusion, I’m making explicit an input parameter: one deconfuses for a reason. Maybe this reason is curiosity, the advancement of knowledge, or because their proposed solution to AI Alignment requires a component they’re confused about. The nature of the application doesn’t matter for the definition: only the fact that there is an application.
This application, in turn, plays a fundamental role for deconfusion. The goal is not to find the essence of the concept investigated, but to find the version of the intuition that actually helps for the application. The latter thus provide a direction for deconfusion.
What is confusion?
For deconfusion to serve, one must be confused. But what exactly does it mean to be confused? Maybe it amounts to the lack of a mathematical model. But that’s clearly too strong: we don’t feel confused about many concepts and ideas for which we don’t have this level of understanding.
No, a better starting point is Nate’s pithy statement of deconfusion. More specifically, the part about “continuously accidentally spouting nonsense”. The spouting nonsense is pretty straightforward: when confused, we tend to say incoherent things, or things that on reflection don’t make much sense. Even more interesting is the “accidentally” part. Confusion doesn’t usually declare itself: often, we only vaguely realize the issue. There is this nagging feeling inside, but not a crisp statement of the confusion. After all, such a crisp statement would already be part of deconfusion!
One tool of deconfusion that I’ll introduce in the next section is the extensive definition: a definition as a list of examples. Even if It only partially constrain the concept, it can be pretty useful for getting a grip on it. In that vein, here are some salient examples of confusion:
This form of confusion is about doing that to oneself. We change the underlying meaning of the concept when thinking of different points, without necessarily noticing.
All of these are examples of confusion that I think deconfusion can address. On the other hand, there is at least one colloquial use of the word “confusion” that I expect to be mostly orthogonal to deconfusion: the one related to logical non-omniscience and inferential distance. Even if I get a full textbook on the issue I was confused about, which answers all of my questions perfectly well, I have to take the time to study before I stop being confused. This confusion of mine, while still real, isn’t what I’m pointing at here. This is not the sort of confusion that I expect deconfusion to dissolve.
Dissolving confusion
Last but not least, I see deconfusion as… dissolving confusion (for the application). How does one dissolve confusion? The position I’m taking is reductionist: by decomposing it into less confused and more understood components. We aim to explain what we’re confused about based on what we already understand.
More concretely, this is done through building what I call conceptual tools -- ways of thinking which help us reach this goal. I want to focus on three kinds of conceptual tools that seem very important for deconfusion to me: handles, languages, and translations.
Handles: fixing the concept
Handle is my name for the actual end-product of deconfusion: the explanation of the confusing concept in terms of clearer parts. Intuitively, a handle looks like a formula or a definition. It can have different levels of details and formalism, but it tends to answer definitational questions. This also means that pure handle-building assumes a base material of less confused ideas to build from, without concerning itself with the issues with this deconfused API (we’ll see about issues in the language/API later).
Why this name? Because handles allow us to grab and manipulate the object they are attached to. Similarly, conceptual handles fix and make more precise the initial intuitions. They force our thoughts about the subject to take a crystallized concrete form, that we can then criticize and manipulate at will.
We can even go a bit more formal, and see handles as pointing to a subset of a given “concept-space”, composed of pairs of mathematical models and interpretations. Ideally we would like a pointer to a single point in concept-space (a formal definition with just the right number of degrees of freedom), but that’s often too difficult in general. Instead, we might get a definition based on previously deconfused concepts (a set of models in concept space which all use the simple concepts as gears) or an extensive definition (a set of models in concept space which all agree with the examples). And even if it feels insufficient, for some applications and some forms of confusion, such deconfusion is already enough.
Let’s get through some examples of handles, to see the variety available to the deconfusion researcher:
To get more concrete, here are some great examples of handles from within AI alignment research:
Languages: creating new APIs
When building handles, one might realize that some basic building block is missing. If only we had a deconfusion for this, the original concept would be much easier to clarify. When this happens once or twice, it’s mostly about building other, lower-level handles. But if it becomes endemic, it turns into language building.
This language building is also a part of deconfusion. First, it fits my take, because languages and APIs are expressed in terms of lower-level languages. And second, it clarifies the building block for handles, a clear part of deconfusion. Recall also that one example of confusion was the suspicion that the language used for talking about the concept is inadequate.
Interestingly, I expect that for many people, this sort of language building is what deconfusion looks like. After all, this is the part of deconfusion that is most common in fundamental science: paradigms in hard sciences tend to be heavily centered about languages (called theories). Similarly, multiple well-known research products from MIRI are languages: Cartesian Frames and Finite Factored Sets are two recent examples. For a less intuitive example, I consider Risk from Learned Optimization deconfusion, as it produced a language by cutting the problem of alignment differently and proposing partial handles for the different parts.
I have less to say about languages because I expect this concept is already deconfused enough for my purpose here (as opposed to handles for example). It still represents an exciting subject for deconfusion, because the standard way of thinking about languages as tools doesn’t fit exactly what I’m pointing at. The closest is probably API design.
Note that distinguishing between handles and languages is not necessarily trivial. My heuristic is to look if the end result can be used as is for the application (handles) or if it must be used to deconfused something else before being used in concrete applications (languages).
Translation: linking between languages
One last category of conceptual tools which seems highly valuable for deconfusion are translation. By this I mean linking the ideas one tried to deconfused with another field, hopefully more deconfused. At one extreme of the spectrum, analogies are relatively informal translations; at the other end there are isomorphisms.
While I don’t know of an example from AI Alignment, here are some examples from computer science.
How translation plays in deconfusion is by taking advantage of previous work to bypass language building. It’s about recognizing that another language would do the trick for handle-building, even when the connection is far from trivial.
An extended analogy: programming
Despite a sprinkle of examples and a smidge of formalism, this post has been quite dry and abstract. Which is why I now present an extended concrete analogy between my take on deconfusion, and programming. (Thanks to Abram for the suggestion)
Deconfusion is basically analogous to writing a program that we don’t already know how to write. Let’s assume this is a function, to remove the subtleties of side-effects (who need side-effects anyway?).
First, we have a reason for wanting to write this function -- the application. It might be pure curiosity, for fun, or for a work project. And this reason will change how we write the function, and what will satisfy us. For example, a function just for curiosity might be significantly less maintainable than a function for a work project.
Next we arrive at the confusion: we don’t know yet how to write this function. We also expect that it will be harder than just mixing a couple programming recipes we already know. Most of the examples in the extensive definition of confusion have a programming analogous (what’s missing is the self-motte-and-bailey):
Finally, the ways of getting to the function we want to write are analogous with the conceptual tools I presented for dissolving confusion:
Just like in the deconfusion case, handle-building can create holes that need to be patched either by handle building (carving a subfunction) or by language building/refactoring (noticing that you need an info that isn’t returned by the API).
There’s another fun analogy for translation: compiling. This isn’t exactly what I have in mind, but compiling does involve the translation of one programming language into another (assembly or C or an intermediate representation).
Isn’t that just conceptual engineering?
Very quickly, I want to address the similarity between my take on deconfusion and the philosophical approach of conceptual engineering. The latter is basically about creating ideas and concepts for a purpose (hence the “engineering”), instead of looking for the essence of these ideas. This is indeed pretty close to handle-building.
Still, I don’t think conceptual engineering is the right starting point for this discussion, for the following reasons:
What does this buy us out?
After doing all the work in understanding this perspective on deconfusion, we can finally grab the low hanging fruits that I mentioned at the beginning of this post.
Deconfusion isn’t limited to fundamental science and distillation
The explicit application should make clear that deconfusion in this sense can be applied to almost anything, not only fundamental science and pure theory. I even went to this cooking class once where the chef proposed his own deconfusion of the transformations of food induced by different cooking techniques -- I still use it years later.
Because of the confusion of deconfusion with fundamental research, I expect deconfusion to have a bad wrap for researchers focused on prosaic AI, and who always want to backchain to local search. I hope that my proposed take on deconfusion has clarified that they too can benefit from deconfusion. Maybe it’s not what they need, and it certainly isn’t the sole ingredient of a solution, but it is one powerful approach to have in your toolbox.
Distillation on the other hand might not involve deconfusion at all. I already mentioned that pure inferential distance isn’t the sort of confusion I consider for deconfusion. This by default removes most of distillation from deconfusion. Still, I think that in some rare cases, the inferential distance is closed by actually providing a better deconfusion that the original one, or by deconfusion the handle itself. Not sure, but the tablecloth analogy for relativity looks like one plausible example.
The importance of deconfusion for AI Alignment
I claim that deconfusion is actually a pretty big bottleneck for AI Alignment, whatever the approach taken. The common thread behind the lack of paradigm, the difficulty of some new entrants to deal with the unformalized setting, the lack of consensus, are the vast amounts of deconfusion that still need to be done.
Actually, even someone who wants to argue (honestly) for alignment by default would benefit from deconfusion, because it’s still quite confusing what the thing that would happen by default in this view even is.
To give a range of examples, here is a non-exhaustive list of uses of deconfusion for various types of approaches:
Checking deconfusion
All of that is well and good. But isn’t deconfusion… you know, fuzzy? Like messy and informal and the sort of things that’s impossible to check and falsify? Actually, this perspective on deconfusion highlights that deconfusion is more falsifiable than most formal fundamental research. Because it’s ultimately about making the confused bunch of intuitions more concrete, more manipulable. And because it is always done with an application in mind.
Concretely, the standards of judgment change a little bit between the different sort of conceptual tools that I presented:
What it means to do full time deconfusion
This consequence of my take is a bit more personal: it’s about clarifying the kind of research I’m doing. One thing that made me feel bad in AI alignment is that I don’t have an agenda, a prefered method for alignment. Instead, I work on multiple problems with multiple researchers.
Yet a link exists: deconfusion. Every work I’m doing (even this!) tends to be along the lines I present in this post. This view also addresses my anxieties about not having an approach: as a full deconfusion researcher, it makes sense that I’m applying deconfusion skills (hopefully) and mindset to different projects with different people. Just like an applied mathematician in the sense of Shannon would do.
What I really took from this is the realization that in most collaborations, I’m at the service of the other researchers. Not because of weird notions of worth or experience, but because I’m deconfusing their intuitions for their applications.
Conclusion
In summary, I’m proposing a take on deconfusion as the process of dissolving confusion, by reducing the object of thought to less confused and better understood ideas, with an application in mind.
It has the following components:
Taking this perspective on deconfusion already bears valuable fruits for AI Alignment:
What’s left now? Well, I also think this take on deconfusion can be a great basis for investigating how to do deconfusion, what skills are most important, and how to learn them. This entails the hope for a sort of textbook of deconfusion. I’m far from this at the moment though, but I’m working on it. It’s especially exciting because it might help me become better at deconfusion.
Another project I have is to collect many deconfusion open-problems in a post, to help newcomers and interested researchers do the sort of “freelance deconfusion” I’m doing myself.