Thanks for writing this up! I love this topic and I think everyone should talk about it more!
On cortical uniformity:
My take (largely pro-cortical-uniformity) is in the first part of this post. I never did find better or more recent sources than those two book chapters, but have gradually grown a bit more confident in what I wrote for various more roundabout reasons. See also my more recent post here.
On the similarity of neocortical algorithms to modern ML:
I am pretty far on the side of "neocortical algorithms are different than today's most popular ANNs", i.e. I think that both are "general" but I reached that conclusion independently for each. If I had to pick one difference, I would say it's that neocortical algorithms use analysis-by-synthesis—i.e., searching through a space of generative models for one that matches the data—and relatedly planning by probabilistic inference. This type of algorithm is closely related to probabilistic programming and PGMs—see, for example, Dileep George's work. In today's popular ANNs, this kind of analysis-by-synthesis and planning is either entirely absent or arguably present as a kind of add-on, but it's not a core principle of the algorithm. This is obviously not the only difference between neocortical algorithms and mainstream ANNs. Some are really obvious: the neocortex doesn't use backprop! More controversially, I don't even think the neocortex even uses real-valued variables in its models, as opposed to booleans—well, I would want to put some caveats on that, but I believe something in that general vicinity.
So basically, I think the algorithms most similar to the neocortex are a bit of a backwater within mainstream ML research, with essentially no SOTA results on popular benchmarks ... which makes it a bit awkward for me to argue that this is the corner from which we will get AGI. Oh well, that's what I believe anyway!
On predictive coding:
Depending on context, I'll say I'm either an enthusiastic proponent or strong critic of predictive coding. Really, I have a particular version of it I like, described here. I guess I disagree with Friston, Clark, etc. most strongly in that they argue that predictive coding is a helpful way to think about the operation of the whole brain, whereas I only find it helpful when discussing the neocortex in particular. Again see here for my take on the rest of the brain. My other primary disagreement is that I don't see "minimizing prediction error" as a foundational principle, but rather an incidental consequence of properly-functioning neocortical algorithms under certain conditions. (Specifically, from the fact that the neocortex will discard generative models that get repeatedly falsified.)
I think there is a lot of evidence for the neocortex having a zoo of generative models that can be efficiently searched through and glued together, not only for low-level perception but also for high-level stuff. I guess the evidence I think about is mostly introspective though. For example, this book review about therapy has (in my biased opinion) an obvious and direct correspondence with how I think the neocortex processes generative models.
This post is what first gave me a major update towards "an AI with a simple single architectural pattern scaled up sufficiently could become AGI", in other words, there doesn't necessarily have to be complicated fine-tuned algorithms for different advanced functions–you can get lots of different things from the same simple structure plus optimization. Since then, as far as I can tell, that's what we've been seeing.
How uniform is the neocortex?
The neocortex is the part of the human brain responsible for higher-order functions like sensory perception, cognition, and language, and has been hypothesized to be uniformly composed of general-purpose data-processing modules. What does the currently available evidence suggest about this hypothesis?
"How uniform is the neocortex?” is one of the background variables in my framework for AGI timelines. My aim for this post is not to present a complete argument for some view on this variable, so much as it is to:
There’s a long list of different regions in the neocortex, each of which appears to be responsible for something totally different. One interpretation is that these cortical regions are doing fundamentally different things, and that we acquired the capacities to do all these different things over hundreds of millions of years of evolution.
A radically different perspective, first put forth by Vernon Mountcastle in 1978, hypothesizes that the neocortex is implementing a single general-purpose data processing algorithm all throughout. From the popular neuroscience book On Intelligence, by Jeff Hawkins[1]:
The rest of this post will review some of the evidence around Mountcastle’s hypothesis.
Cortical function is largely determined by input data
When visual inputs are fed into the auditory cortices of infant ferrets, those auditory cortices develop into functional visual systems. This suggests that different cortical regions are all capable of general-purpose data processing.
Humans can learn how to perform forms of sensory processing we haven’t evolved to perform—blind people can learn to see with their tongues, and can learn to echolocate well enough to discern density and texture. On the flip side, forms of sensory processing that we did evolve to perform depend heavily on the data we’re exposed to—for example, cats exposed only to horizontal edges early in life don’t have the ability to discern vertical edges later in life. This suggests that our capacities for sensory processing stem from some sort of general-purpose data processing, rather than innate machinery handed to us by evolution.
Blind people who learn to echolocate do so with the help of repurposed visual cortices, and they can learn to read Braille using repurposed visual cortices. Our visual cortices did not evolve to be utilized in these ways, suggesting that the visual cortex is doing some form of general-purpose data processing.
There’s a man who had the entire left half of his brain removed when he was 5, who has above-average intelligence, and went on to graduate college and maintain steady employment. This would only be possible if the right half of his brain were capable of taking on the cognitive functions of the left half of the brain.
The patterns identified by the primary sensory cortices (for vision, hearing, and seeing) overlap substantially with the patterns that numerous different unsupervised learning algorithms identified from the same data, suggesting that the different cortical regions (along with the different unsupervised learning algorithms) are all just doing some form of general-purpose pattern recognition on its input data.
Deep learning and cortical generality
The above evidence does not rule out the possibility that the cortex's apparent adaptability stems from developmental triggers, rather than some capability for general-purpose data-processing. By analogy, stem cells all start out very similar, only to differentiate into cells with functions tailored to the contexts in which they find themselves. It’s possible that different cortical regions have hard-coded genomic responses for handling particular data inputs, such that the cortex gives one hard-coded response when it detects that it’s receiving visual data, another hard-coded response when it detects that it’s receives auditory data, etc.
If this were the case, the cortex’s data-processing capabilities can best be understood as specialized responses to distinct evolutionary needs, and our ability to process data that we haven’t evolved to process (e.g. being able to look at a Go board and intuitively discern what a good next move would be) most likely utilizes a complicated mishmash of heterogeneous data-processing abilities acquired over evolutionary timescales.
Before I learned about any of the advancements in deep learning, this was my most likely guess about how the brain worked. It had always seemed to me that the hardest and most mysterious part of intelligence was intuitive pattern-recognition, and that the various forms of intuitive processing that let us recognize images, say sentences, and play Go might be totally different and possibly arbitrarily complex.
So I was very surprised when I learned that a single general method in deep learning (training an artificial neural network on massive amounts of data using gradient descent)[2] led to performance comparable or superior to humans’ in tasks as disparate as image classification, speech synthesis, and playing Go. I found superhuman Go performance particularly surprising—intuitive judgments of Go boards encode distillations of high-level strategic reasoning, and are highly sensitive to small changes in input. Neither of these is true for sensory processing, so my prior guess was that the methods that worked for sensory processing wouldn’t have been sufficient for playing Go as well as humans.[3]
This suggested to me that there’s nothing fundamentally complex or mysterious about intuition, and that seemingly-heterogeneous forms of intuitive processing can result from simple and general learning algorithms. From this perspective, it seems most parsimonious to explain the cortex’s seemingly general-purpose data-processing capabilities as resulting straightforwardly from a general learning algorithm implemented all throughout the cortex. (This is not to say that I think the cortex is doing what artificial neural networks are doing—rather, I think deep learning provides evidence that general learning algorithms exist at all, which increases the prior likelihood on the cortex implementing a general learning algorithm.[4])
The strength of this conclusion hinges on the extent to which the “artificial intuition” that current artificial neural networks (ANNs) are capable of is analogous to the intuitive processing that humans are capable of. It’s possible that the “intuition” utilized by ANNs is deeply analogous to human intuition, in which case the generality of ANNs would be very informative about the generality of cortical data-processing. It's also possible that "artificial intuition" is different in kind from human intuition, or that it only captures a small fraction of what goes into human intuition, in which case the generality of ANNs would not be very informative about the generality of cortical data-processing.
It seems that experts are divided about how analogous these forms of intuition are, and I conjecture that this is a major source of disagreement about overall AI timelines. Shane Legg (a cofounder of DeepMind, a leading AI lab) has been talking about how deep belief networks might be able to replicate the function of the cortex before deep learning took off, and he’s been predicting human-level AGI in the 2020s since 2009. Eliezer Yudkowsky has directly talked about AlphaGo providing evidence of "neural algorithms that generalize well, the way that the human cortical algorithm generalizes well" as an indication that AGI might be near. Rodney Brooks (the former director of MIT’s AI lab) has written about how deep learning is not capable of real perception or manipulation, and thinks AGI is over 100 years away. Gary Marcus has described deep learning as a “wild oversimplification” of the "hundreds of anatomically and likely functionally [distinct] areas" of the cortex, and estimates AGI to be 20-50 years away.
Canonical microcircuits for predictive coding
If the cortex were uniform, what might it actually be doing uniformly?
The cortex has been hypothesized to consist of canonical microcircuits that implement predictive coding. In a nutshell, predictive coding (aka predictive processing) is a theory of brain function which hypothesizes that the cortex learns hierarchical structure of the data it receives, and uses this structure to encode predictions about future sense inputs, resulting in “controlled hallucinations” that we interpret as direct perception of the world.
On Intelligence has an excerpt that cleanly communicates what I mean by “learning hierarchical structure”:
The clearest evidence that the brain is learning hierarchical structure comes from the visual system. The visual cortex is known to have edge detectors at the lowest levels of processing, and neurons that fire when shown images of particular people, like Bill Clinton.
What does predictive coding say the cortex does with this learned hierarchical structure? From an introductory blog post about predictive processing:
An illustration of predictive processing, from the same source:
Predictive coding has been hailed by prominent neuroscientists as a possible unified theory of the brain, but I’m confused about how much physiological evidence there is that the brain is actually implementing predictive coding. It seems like there’s physiological evidence in support of predictive coding being implemented in the visual cortex and in the auditory cortex, and there’s a theoretical account of how the prefrontal cortex (responsible for higher cognitive functions like planning, decision-making, and executive function) might be utilizing similar principles. This paper and this paper review some physiological evidence of predictive coding in the cortex that I don’t really know how to interpret.
My current take
I find the various pieces of evidence that cortical function depends largely on data inputs (e.g. the ferret rewiring experiment) to be pretty compelling evidence of general-purpose data-processing in the cortex. The success of simple and general methods in deep learning across a wide range of tasks suggests that it’s most parsimonious to model the cortex as employing general methods throughout, but only to the extent that the capabilities of artificial neural networks can be taken to be analogous to the capabilities of the cortex. I currently consider the analogy to be deep, and intend to explore my reasons for thinking so in future posts.
I think the fact that predictive coding offers a plausible theoretical account for what the cortex could be doing uniformly, which can account for higher-level cognitive functions in addition to sensory processing, is itself some evidence of cortical uniformity. I’m confused about how much physiological evidence there is that the brain is actually implementing predictive coding, but I’m very bullish on predictive coding as a basis for a unified brain theory based on non-physiological evidence (like our subjective experiences making sense of the images of splotches) that I intend to explore in a future post.
Thanks to Paul Kreiner, David Spivak, and Stag Lynn for helpful suggestions and feedback, and thanks to Jacob Cannell for writing a post that inspired much of my thinking here.
This blog post comment has some good excerpts from On Intelligence. ↩︎
Deep learning is a general method in the sense that most tasks are solved by utilizing a handful of basic tools from a standard toolkit, adapted for the specific task at hand. Once you’ve selected the basic tools, all that’s left is figuring out how to supply the training data, specifying the objective that lets the AI know how well it’s doing, throwing a lot of computation at the problem, and fiddling with details. My understanding is that there typically isn’t much conceptual ingenuity involved in solving the problems, that most of the work goes into fiddling with details, and that trying to be clever doesn't lead to better results than using standard tricks with more computation and training data. It's also worth noting that most of the tools in this standard toolkit have been around since the 90's (e.g. convolutional neural networks, LSTMs, reinforcement learning, backpropagation), and that the recent boom in AI was driven by using these decades-old tools with unprecedented amounts of computation. ↩︎
AlphaGo did simulate future moves to achieve superhuman performance, so the direct comparison against human intuition isn't completely fair. But AlphaGo Zero's raw neural network, which just looks at the "texture" of the board without simulating any future moves, can still play quite formidably. From the AlphaGo Zero paper: "The raw neural network, without using any lookahead, achieved an Elo rating of 3,055. AlphaGo Zero achieved a rating of 5,185, compared to 4,858 for AlphaGo Master, 3,739 for AlphaGo Lee and 3,144 for AlphaGo Fan." (AlphaGo Fan beat the European Go champion 5-0.) ↩︎
Eliezer Yudkowsky has an insightful exposition of this point in a Facebook post. ↩︎