For what it's worth, I often find Eliezer's arguments unpersuasive because they seem shallow. For example:
The insight is in realizing that the hypothetical planner is only one line of outer shell command away from being a Big Scary Thing and is therefore also liable to be Big and Scary in many ways.
This seem like a fuzzy "outside view" sort of argument. (Compare with: "A loaded gun is one trigger pull away from killing someone and is therefore liable to be deadly in many ways." On the other hand, a causal model of a gun lets you explain which specific gun operations can be deadly and why.)
I'm not saying Eliezer's conclusion is false. I find other arguments for that conclusion much more persuasive, e.g. involving mesa-optimizers, because there is a proposed failure type which I understand in causal/mechanistic terms.
(I can provide other examples of shallow-seeming arguments if desired.)
Great investigation/clarification of this recurring idea from the ongoing Late 2021 MIRI Conversations.
You might not like his tone in the recent discussions, but if someone has been saying the same thing for 13 years, nobody seems to get it, and their model predicts that this will lead to the end of the world, maybe they can get some slack for talking smack.
Good point and we should. Eliezer is a valuable source of ideas and experience around alignment, and it seems like he's contributed immensely to this whole enterprise.
I just hope all his smack talking doesn't turn off/away talented people coming to lend a hand on alignment. I expect a lot of people on this (AF) forum found it like me after reading all Open Phil and 80,000 Hours' convincing writing about the urgency of solving the AI alignment problem. It seems silly to have those orgs working hard to recruit people to help out, only to have them come over here and find one of the leading thinkers in the community going on frequent tirades about how much EAs suck, even though he doesn't know most of us. Not to mention folks like Paul and Richard who have been taking his heat directly in these marathon discussions!
Here is an exploration of what Eliezer Yudkowsky means when he writes about deep vs shallow patterns (although I’ll be using "knowledge" instead of "pattern" for reasons explained in the next section). Not about any specific pattern Yudkowsky is discussing, mind you, about what deep and shallow patterns are at all. In doing so, I don’t make any criticism of his ideas and instead focus on quoting him (seriously, this post is like 70% quotes) and interpreting him by finding the best explanation I can of his words (that still fit them, obviously). Still, there’s a risk that my interpretation misses some of his points and ideas— I’m building a lower-bound on his argument’s power that is as high as I can get, not an upper-bound. Also, I might just be completely wrong, in which case defer to Yudkowsky if he points out that I’m completely missing the point.
Thanks to Eliezer Yudkowsky, Steve Byrnes, John Wentworth, Connor Leahy, Richard Ngo, Kyle, Laria, Alex Turner, Daniel Kokotajlo and Logan Smith for helpful comments on a draft.
Back to the FOOM: Yudkowsky’s explanation
In recent discussions, Yudkowsky often talks about deep patterns and deep thinking. What he made clear in a comment on this draft is that he has been using the term “deep patterns” in two different ways:
Focusing on deep knowledge then, Yudkowsky recently seems to ascribe his interlocutors’ failure to grasp his point to their inability to grasp different instances of deep knowledge.
(All quotes from Yudkowsky if not mentioned otherwise)
(From the first discussion with Richard Ngo)
That being said, he doesn’t really explain what this sort of deep knowledge is.
(From the same discussion with Ngo)
The thing is, he did exactly that in the FOOM debate with Robin Hanson 13 years ago. (For those unaware of this debate, Yudkoswky is responding to Hanson’s use of trends — like Moore’s law — extrapolations to think about intelligence explosion).
(From The Weak Inside View (2008))
An important subtlety here comes from the possible conflation of two uses of “surface”: the implicit use of “surface knowledge” as the consequences of some underlying causal processes/generator, and the explicit use of “surface knowledge” as drawing similarities without thinking about the causal process generating them. To simplify the discussion, let’s use the more modern idiom of “shallow” for the more explicit sense here.
So what is Yudkowsky pointing at? Two entangled things:
Imagine a restaurant that has a dish you really like. The last 20 times you went to eat there, the dish was amazing. So should you expect that the next time it will also be great? Well, that depends on whether anything in the kitchen changes. Because you don’t understand what makes the dish great, you don’t know of the most important aspects of the causal generators. So if they can’t buy their meat/meat-alternative at the same place, maybe that will change the taste; if the cook is replaced, maybe that will change the taste; if you go at a different time of the day, maybe that will change the taste.
You’re incapable of extending your trend (except by replicating all the conditions) to make a decent prediction because you don’t understand where it comes from. If on the other hand you knew why the dish was so amazing (maybe it’s the particular seasoning, or the chef’s touch), then now you can estimate its quality. But then you’re not using the trend, you’re using a model of the underlying causal process.
Here is another phrasing by Yudkowsky from the same essay:
More generally, these quotes point out to what Yudkowsky means when he says “deep knowledge”: the sort of reasoning that focuses on underlying causal models.
As he says himself:
Before going deeper into how such deep knowledge/Weak Inside View works and how to build confidence in it, I want to touch upon the correspondence between this kind of thinking and the Lucas Critique in macroeconomics. This link has been pointed out in the comments of the recent discussions — we thus shouldn’t be surprised that Yudkowsky wrote about it 8 years ago (yet I was surprised by this).
(From Intelligence Explosion Microeconomics (2013))
and later in that same essay:
This last sentence in particular points out another important feature of deep knowledge: that it might be easier to say negative things (like “this can’t work”) than precise positive ones (like “this is the precise law”) because the negative thing can be something precluded by basically all coherent/reasonable causal explanations, while they still disagree on the precise details.
Let’s dig deeper into that by asking more generally what deep knowledge is useful for.
How does deep knowledge work?
We now have a pointer (however handwavy) to what Yudkowsky means by deep knowledge. Yet we have very little details at this point about what this sort of thinking looks like. To improve that situation, the next two subsections explore two questions about the nature of deep knowledge: what is it for, and where does it come from?
The gist of this section is that:
What is deep knowledge useful for?
The big difficulty that comes up again and again, in the FOOM debate with Hanson and the discussion with Ngo and Christiano, is that deep knowledge doesn’t always lead to quantitative predictions. That doesn’t mean that the deep knowledge isn’t quantitative itself (expected utility maximization is an example used by Yudkowsky that is completely formal and quantitative), but that the causal model only partially constrains what can happen. That is, it doesn’t constrain enough to make precise quantitative predictions.
Going back to his introduction of the Weak Outside view, recall that he wrote:
He follows up writing:
Let’s summarize it that way: deep knowledge only partially constrains the surface phenomena it describes (which translate into quantitative predictions) and it takes a lot of detailed deep knowledge (and often data) to refine it enough to pin down exactly the phenomenon and make precise quantitative predictions. Alignment and AGI are fields where we don’t have that much deep knowledge, and the data is sparse, and thus we shouldn’t expect precise quantitative predictions anytime soon.
Of course, just because a prediction is qualitative doesn’t mean it comes from deep knowledge; all hand-waving isn’t wisdom. For a good criticism of shallow qualitative reasoning in alignment, let’s turn to Qualitative Strategies of Friendliness.
The shallow qualitative reasoning criticized here relies too much on human common sense and superiority to the AI, when the situation to predict is about superintelligence/AGI. That is, this type of qualitative reasoning extrapolates across a change in causal generators.
On the other hand, Yudkowsky uses qualitative constraints to guide his criticism: he knows there’s a problem because the causal model forbids that kind of solution. Just like the laws of thermodynamics forbid perpetual motion machines.
Deep qualitative reasoning starts from the underlying (potentially quantitative) causal explanations and mostly tells you what cannot work or what cannot be done. That is, deep qualitative reasoning points out that a whole swatch of search space is not going to yield anything. A related point is that Yudkwosky rarely (AFAIK) makes predictions, even qualitative ones. He sometimes admits that he might do some, but it feels more like a compromise with the prediction-centered other person than what the deep knowledge is really for. Whereas he constantly points out how certain things cannot work.
(From Qualitative Strategies of Friendliness (2008))
(From the second discussion with Ngo)
(From Security Mindset and Ordinary Paranoia (2017))
Or my reading of the whole discussion with Christiano, which is that Christiano constantly tries to get Yudkowsky to make a prediction, but the latter focuses on aspects of Christiano’s model and scenario that don’t fit his (Yudkoswky’s) deep knowledge.
I especially like the perpetual motion machines analogy, because it drives home how just proposing a tweak/solution without understanding Yudkowsky’s deep knowledge (and what it would take for it to not apply) has almost no chance of convincing him. Because if someone said they built a perpetual motion machine without discussing how they bypass the laws of thermodynamics, every scientifically literate person would be doubtful. On the other hand, if they seemed to be grappling with thermodynamics and arguing for a plausible way of winning, you’d be significantly more interested.
(I feel like Bostrom’s Orthogonality Thesis is a good example of such deep knowledge in alignment that most people get, and I already argued elsewhere that it serves mostly to show that you can’t solve alignment by just throwing competence at it — also note that Yudkowsky had the same pattern earlier/parallely, and is still using it)
To summarize: the deep qualitative thinking that Yudkowsky points out by saying “deep knowledge” is the sort of thinking that cuts off a big chunk of possibility space, that is tells you the whole chunk cannot work. It also lets you judge from the way people propose a solution (whether they tackle the deep pattern or not) whether you should ascribe decent probability to them being right.
A last note in this section: although deep knowledge primarily leads to negative conclusions, it can also lead to positive knowledge through a particularly Bayesian mechanism: if the deep knowledge destroys every known hypothesis/proposal except one (or a small number of them), then that is strong evidence for the ones left.
(This quote is more obscure than the others without the context. It’s from Intelligence Explosion Microeconomics (2013), and discusses the last step in a proposal for formalizing the sort of deep insight/pattern Yudkowksy leveraged during the FOOM debate. If you’re very confused, I feel like the most relevant part to my point is the bold last sentence.)
Where does deep knowledge come from?
Now that we have a decent grounding of what Yudkowsky thinks deep knowledge is for, the biggest question is how to find it, and how to know you have found good deep knowledge. After all, maybe the causal models one assumes are just bad?
This is the biggest difficulty that Hanson, Ngo, and Christiano seemed to have with Yudkowsky’s position.
(Robin Hanson, from the comments after Observing Optimization in the FOOM Debate)
(Richard Ngo from his second discussion with Yudkowsky)
(Paul Christiano from his discussion with Yudkowsky)
Note that these attitudes make sense. I especially like Ngo’s framing. Falsifiable predictions (even just postdictions) are the cornerstone of evaluation hypotheses in Science. It even feels to Ngo (as it felt to me) that Yudkowsky argued for that in the Sequences:
(Ngo from his second discussion with Yudkowsky)
(And Yudkoswky himself from Making Belief Pay Rent (In Anticipated Experience))
But the thing is… rereading part of the Sequences, I feel Yudkowsky was making points about deep knowledge all along? Even the quote I just used, which I interpreted in my rereading a couple of weeks ago as being about making predictions, now sounds like it’s about the sort of negative form of knowledge that forbids “perpetual motion machines”. Notably, Yudkowsky is very adamant that beliefs must tell you what cannot happen. Yet that doesn’t imply at all to make predictions of the form “this is how AGI will develop”, so much as saying things like “this approach to alignment cannot work”.
Also, should I point out that there’s a whole sequence dedicated to the ways rationality can do better than science? (Thanks to Steve Byrnes for the pointer). I’m also sure I would find a lot of relevant stuff by rereading Inadequate Equilibria too, but if I wait to have reread everything by Yudkowsky before posting, I’ll be there a long time…
My Initial Mistake and the Einstein Case
Let me jump here with my best guess of Yudkowsky’s justification of deep knowledge: their ability to both
The thing is, I got it completely wrong initially. Reading Einstein’s Arrogance (2007), an early Sequences post that is all about saying that Einstein had excellent reasons to believe General Relativity’s correctness before experimental verification (of advanced predictions), I thought that relativity was the deep knowledge and that Yudkowsky was pointing out how Einstein, having found an instance of true deep knowledge, could allow himself to be more confident than the social process of Science would permit in the absence of experimental justification.
Einstein’s Speed (2008) made it clear that I had been looking at the moon when I was supposed to see the pointing finger: the deep knowledge Yudkowsky pointed out was not relativity itself, but what let Einstein single it out by a lot of armchair reasoning and better use of what was already known.
More generally, I interpret the whole Science and Rationality Sequence as explaining how deep knowledge can let rationalists do something that isn’t in the purview of traditional Science: estimate which hypotheses make sense before the experimental predictions and evidence come in.
(From Faster Than Science (2008))
There’s a subtlety that is easy to miss: Yudkowsky doesn’t say that specifying an hypothesis in a large answer space makes it high evidence. After all, you can just generate any random guess. What he’s pointing at is that to ascribe a decent amount of probability to a specific hypothesis in a large space through updating on evidence, you need to cut a whole swath of the space to redirect the probability on your hypothesis. And that from a purely computational perspective, this implies more work on whittling down hypotheses than to make the favored hypothesis certain enough through experimental verification.
His claim then seems that Einstein, and other scientists who tended to “guess right” at what would be later experimentally confirmed, couldn’t have been just lucky — they must have found ways of whittling down the vastness of hypothesis space, so they had any chance of proposing something that was potentially right.
Yudkowsky gives some pointers to what he thinks Einstein was doing right.
(From Einstein’s Speed (2008))
So in that interpretation, Einstein learned from previous physics and from thought experiments how to cut away the parts of the hypothesis space that didn’t sound like they could make good physical laws, until he was left with a small enough subspace that he could find the right fit by hand (even if that took him 10 years)
In summary, deep knowledge doesn’t come in the form of a particularly neat hypothesis or compression; it is the engine of compression itself. Deep knowledge compresses “what sort of hypothesis tends to be correct”, such that it can be applied to the search of a correct hypothesis at the object level. That also cements the idea that deep knowledge gives constraints, not predictions: you don’t expect to be able to have such a strong criterion for correct hypothesis that given a massive hypothesis space, you can pinpoint the correct one.
Here it is good to generalize my previous mistake; recall that I took General Relativity for the deep knowledge, when it was actually the sort of constraints on physical laws that Einstein used for even finding General Relativity. Why? I can almost hear Yudkowsky answering in my head: because General Relativity is the part accepted and acknowledged by Science. I don’t think it’s the only reason, but there’s an element of truth: I privileged the “proper” theory with experimental validation over the more vague principles and concepts that lead to it.
A similar mistake is to believe the deep knowledge is the theory when it actually is what the theory and the experiments unearthed. This is how I understand Yudkowsky’s use of thermodynamics and evolutionary biology: he points out at the deep knowledge that led and was revealed by the work on these theories, more than at the theories themselves.
Compression and Fountains of Knowledge
We still don’t have a good way of finding and checking deep knowledge, though. Not any constraint on hypothesis space is deep knowledge, or even knowledge at all. The obvious idea is to have a reason for that constraint. And the reason Yudkowsky goes for almost every time is compression. Not a compressed description, like Moore’s law; nor a “compression” that is as complex as the pattern of hypothesis it’s trying to capture. Compression in the sense that you get a simpler constraint that can get you most of the way to regenerate the knowledge you’re starting from.
This view of the importance of compression is everywhere in the Sequences. A great example is Truly Part of You, which asks what knowledge you could rederive if it was deleted from your mind. If you have a deep understanding of the subject, and you keep recursively asking how a piece of knowledge could be rederived and then how “what’s needed for the derivation” can be rederived, Yudkwosky argues that you will reach “fountains of knowledge”. Or in the terminology of this post, deep knowledge.
What do these fountains look like? They’re not the fundamental theories themselves, but instead their underlying principles. Stuff like the principle of least action, Noether’s theorem and the principles underlying Statistical Mechanics (don’t know enough about it to name them). They are the crystallized insights which constrain enough the search space that we can rederive what we knew from them.
(Feynman might have agreed, given that he chose the atomic hypothesis/principle, “all things are made of atoms — little particles that move around in perpetual motion, attracting each other when they are a little distance apart, but repelling upon being squeezed into one another” was the one sentence he salvage for further generations in case of a cataclysm.)
Here I hear a voice in my mind saying “What does simple mean? Shouldn’t it be better defined?” Yet this doesn’t feel like a strong objection. Simple is tricky to define intensively, but scientists and mathematicians tend to be pretty good at spotting it, as long as they don’t fall for Mysterious Answers. And most of the checks on deep knowledge seem to be in their ability to rederive the known correct hypotheses without adding stuff during the derivation.
A final point before closing this section: Yudkowsky writes that the same sort of evidence can be gathered for more complex arguments if they can be summarized by simple arguments that still get most of the current data right. My understanding here is that he’s pointing at the wiggle room of deep knowledge, that is at the non-relevant ways in which it can be off sometimes. This is important because asking for that wiggle room can sound like ad-hoc adaptation of the pattern, breaking the compression assumption.
(From Intelligence Explosion Microeconomics (2013))
Conclusion
Based on my reading of his position, Yudkowsky sees deep knowledge as highly compressed causal explanations of “what sort of hypothesis ends up being right”. The compression means that we can rederive the successful hypotheses and theories from the causal explanation. Finally, such deep knowledge translates into partial constraints on hypothesis space, which focus the search by pointing out what cannot work. This in turn means that deep knowledge is far better at saying what won’t work than at precisely predicting the correct hypothesis.
I also want to point out something that became clearer and clearer in reading old posts: Yudkowsky is nothing if not coherent. You might not like his tone in the recent discussions, but if someone has been saying the same thing for 13 years, nobody seems to get it, and their model predicts that this will lead to the end of the world, maybe they can get some slack for talking smack.