Human instincts, symbol grounding, and the blank-slate neocortex

Steven Byrnes

Intro: What is Common Cortical Algorithm (CCA) theory, and why does it matter for AGI?

As I discussed at Jeff Hawkins on neuromorphic AGI within 20 years, and was earlier discussed on LessWrong at The brain as a universal learning machine, there is a theory, due originally to Vernon Mountcastle in the 1970s, that the neocortex^[1] (75% of the human brain by weight) consists of ~150,000 interconnected copies of a little module, the "cortical column", each of which implements the same algorithm. Following Jeff Hawkins, I'll call this the "common cortical algorithm" (CCA) theory. (I don't think that terminology is standard.)

So instead of saying that the human brain has a vision processing algorithm, motor control algorithm, language algorithm, planning algorithm, and so on, in CCA theory we say that (to a first approximation) we have a massive amount of "general-purpose neocortical tissue", and if you dump visual information into that tissue, it does visual processing, and if you connect that tissue to motor control pathways, it does motor control, etc.

Whether and to what extent CCA theory is true is, I think, very important for AGI forecasting, strategy, and both technical and non-technical safety research directions—see my answer here for more details.

Should we believe CCA theory?

CCA theory, as I'm using the term, is a simplified model. There are almost definitely a couple caveats to it:

There are sorta "hyperparameters" on the generic learning algorithm which seem to be set differently in different parts of the neocortex. For example, some areas of the cortex have higher or lower density of particular neuron types. There are other examples too.^[2] I don't think this significantly undermines the usefulness or correctness of CCA theory, as long as these changes really are akin to hyperparameters, as opposed to specifying fundamentally different algorithms. So my reading of the evidence is that if you put, say, motor nerves coming out of visual cortex tissue, the tissue could do motor control, but it wouldn't do it quite as well as the motor cortex does.^[3]
There is almost definitely a gross wiring diagram hardcoded in the genome—i.e., set of connections between different neocortical regions and each other, and other parts of the brain. These connections later get refined and edited during learning. Again, we can ask how much the existence of this innate gross wiring diagram undermines CCA theory. How complicated is the wiring diagram? Is it millions of connections among thousands of tiny regions, or just tens of connections among a few regions? Would the brain work at all if you started with a random wiring diagram? I don't know for sure, but for various reasons, my current belief is that this initial gross wiring diagram is not carrying much of the weight of human intelligence, and thus that this point is not a significant problem for the usefulness of CCA theory. (This is a loose statement; of course it depends on what questions you're asking.) I think of it more like: if it's biologically important to learn a concept space that's built out of associations between information sources X, Y, and Z, well, you just dump those three information streams into the same part of the cortex, and then the CCA will take it from there, and it will reliably build this concept space. So once you have the CCA nailed down, it kinda feels to me like you're most of the way there....^[4]

Going beyond these caveats, I found pretty helpful literature reviews on both sides of the issue:

The experimental evidence for CCA theory: see chapter 5 of Rethinking Innateness (1996)
The experimental evidence against CCA theory: see chapter 5 of The Blank Slate by Steven Pinker (2002).

I won't go through the debate here, but after reading both of those I wound up feeling that CCA theory (with the caveats above) is probably right, though not 100% proven. Please comment if you've seen any other good references on this topic, especially more up-to-date ones.

(Update: I found another reference on CCA; see Gary Marcus vs Cortical Uniformity.)

CCA theory does not mean "no inductive biases"—of course there are inductive biases! It means that the inductive biases are sufficiently general and low-level that they work equally well for extremely diverse domains such as language, vision, motor control, planning, math homework, and so on. I typically think that the inductive biases are at a very low level, things like "we should model inputs using a certain type of data structure involving temporal sequences and spatial relations", and not higher-level semantic knowledge like intuitive biology or "when is it appropriate to feel guilty?" or tool use etc. (I don't even think object permanence or intuitive psychology are built into the neocortex; I think they're learned in early infancy. This is controversial and I won't try to justify it here. Well, intuitive psychology is a complicated case, see below.)

Anyway, that brings us to...

CCA theory vs human-universal traits and instincts

The main topic for this post is:

If Common Cortical Algorithm theory is true, then how do we account for all the human-universal instincts and behaviors that evolutionary psychologists talk about?

Indeed, we know that there are a diverse set of remarkably specific human instincts and mental behaviors evolved by natural selection. Again, Steven Pinker's The Blank Slate is a popularization of this argument; it ends with Donald E. Brown's giant list of "human universals", i.e. behaviors that are observed in every human culture.

Now, 75% of the human brain (by weight) is the neocortex, but the other 25% consists of various subcortical ("old-brain") structures like the amygdala, and these structures are perfectly capable of implementing specific instincts. But these structures do not have access to an intelligent world-model—only the neocortex does! So how can the brain implement instincts that require intelligent understanding? For example, maybe the fact that "Alice got two cookies and I only got one!" is represented in the neocortex as the activation of neural firing pattern 7482943. There's no obvious mechanism to connect this arbitrary, learned pattern to the "That's so unfair!!!" section of the amygdala. The neocortex doesn't know about unfairness, and the amygdala doesn't know about cookies. Quite a conundrum!

(Update much later: Throughout this post, wherever I wrote "amygdala", I should have said "hypothalamus and brainstem". See here for a better-informed discussion.)

This is really a symbol grounding problem, which is the other reason this post is relevant to AI alignment. When the human genome builds a human, it faces the same problem as a human programmer building an AI: how can one point a goal system at things in the world, when the internal representation of the world is a complicated, idiosyncratic, learned data structure? As we wrestle with the AI goal alignment problem, it's worth studying what human evolution did here.

List of ways that human-universal instincts and behaviors can exist despite CCA theory

Finally, the main part of this post. I don't know a complete answer, but here are some of the categories I've read about or thought of, and please comment on things I've left out or gotten wrong!

Mechanism 1: Simple hardcoded connections, not implemented in the neocortex

Example: Enjoying the taste of sweet things. This one is easy. I believe the nerve signals coming out of taste buds branch, with one branch going to the cortex to be integrated into the world model, and another branch going to subcortical regions. So the genes merely have to wire up the sweetness taste buds to the good-feelings subcortical regions.

Mechanism 2: Subcortex-supervised learning.

Example: Wanting to eat chocolate. This is different than the previous item because "sweet taste" refers to a specific innate physiological thing, whereas "chocolate" is a learned concept in the neocortex's world-model. So how do we learn to like chocolate? Because when we eat chocolate, we enjoy it (Mechanism 1 above). The neocortex learns to predict a sweet taste upon eating chocolate, and thus paints the world-model concept of chocolate with a "sweet taste" property. The supervisory signal is multidimensional, such that the neocortex can learn to paint concepts with various labels like "painful", "disgusting", "comfortable", etc., and generate appropriate behaviors in response. (Vaguely related: the DeepMind paper Prefrontal cortex as a meta-reinforcement learning system.)

Mechanism 3: Same learning algorithm + same world = same internal model

Possible example: Intuitive biology. In The Blank Slate you can find a discussion of intuitive biology / essentialism, which "begins with the concept of an invisible essence residing in living things, which gives them their form and powers." Thus preschoolers will say that a dog altered to look like a cat is still a dog, yet a wooden toy boat cut into the shape of a toy car has in fact become a toy car. I think we can account for this very well by saying that everyone's neocortex has the same learning algorithm, and when they look at plants and animals they observe the same kinds of things, so we shouldn't be surprised that they wind up forming similar internal models and representations. I found a paper that tries to spell out how this works in more detail; I don't know if it's right, but it's interesting: free link, official link.

Mechansim 4: Human-universal memes

Example: Fire. I think this is pretty self-explanatory. People learn about fire from each other. No need to talk about neurons, beyond the more general issues of language and social learning discussed below.

Mechanism 5: "Two-process theory"

Possible example: Innate interest in human faces.^[5] The subcortex-supervised learning mechanism above (Mechanism 2) can be thought of more broadly as an interaction between a hardwired subcortical system that creates a "ground truth", and a cortical learning algorithm that then learns to relate that ground truth to its complex internal representations. Here, Johnson's "two-process theory" for faces fits this same mold, but with a more complicated subcortical system for ground truth. In this theory, a subcortical system (ETA: specifically, the superior colliculus^[6]) gets direct access to a low-resolution version of the visual field, and looks for a pattern with three blobs in locations corresponding to the eyes and mouth of a blurry face. When it finds such a pattern, it passes information to the cortex that this is a very important thing to attend to, and over time the cortex learns what faces actually look like (and suppresses the original subcortical template circuitry). Anyway, Johnson came up with this theory partly based on the observation that newborns are equally entranced by pictures of three blobs versus actual faces (each of which were much more interesting than other patterns), but after a few months the babies were more interested in actual face pictures than the three-blob pictures. (Not sure what Johnson would make of this twitter account.)

(Other possible examples of instincts formed by two-process theory: fear of snakes, interest in human speech sounds, sexual attraction.)

(Update: See my later post Inner alignment in the brain for a more fleshed-out discussion of this mechanism.)

Mechanism 6: Time-windows

Examples: Filial imprinting in animals, incest repulsion (Westermarck effect) in humans. Filial imprinting is a famous result where newborn chicks (and many other species) form a permanent attachment to the most conspicuous moving object that they see in a certain period shortly after hatching. In nature, they always imprint on their mother, but in lab experiments, chicks can be made to imprint on a person, or even a box. As with other mechanisms here, time-windows provides a nice solution to the symbol grounding problem, in that the genes don't need to know what precise collection of neurons corresponds to "mother", they only need to set up a time window and a way to point to "conspicuous moving objects", which is presumably easier. The brain mechanism of filial imprinting has been studied in detail for chicks, and consists of the combination of time-windows plus the two-process model (mechanism 5 above). In fact, I think the two-process model was proven in chick brains before it was postulated in human brains.

There likewise seem to be various time-window effects in people, such as the Westermarck effect, a sexual repulsion between two people raised together as young children (an instinct which presumably evolved to reduce incest).

Mechanism 7 (speculative): empathetic grounding of intuitive psychology.

Possible example: Social emotions (gratitude, sympathy, guilt,...) Again, the problem is that the neocortex is the only place with enough information to, say, decide when someone slighted you, so there's no "ground truth" to use for subcortex-supervised learning. At first I was thinking that the two-process model for human faces and speech could be playing a role, but as far as I know, deaf-blind people have the normal suite of social emotions, so that's not it either. I looked in the literature a bit and couldn't find anything helpful. So, I made up this possible mechanism (warning: wild speculation).

Step 1 is that a baby's neocortex builds a "predicting my own emotions" model using normal subcortex-supervised learning (Mechanism 2 above). Then a normal Hebbian learning mechanism makes two-way connections between the relevant subcortical structures (amygdala) and the cortical neurons involved in this predictive model.

Step 2 is that the neocortex's universal learning algorithm will, in the normal course of development, naturally discover that this same "predicting my own emotions" model from step 1 can be reused to predict other people's emotions (cf. Mechanism 3 above), forming the basis for intuitive psychology. Now, because of those connections-to-the-amygdala mentioned in step 1, the amygdala is incidentally getting signals from the neocortex when the latter predicts that someone else is angry, for example.

Step 3 is that the amygdala (and/or neocortex) somehow learns the difference between the intuitive psychology model running in first-person mode versus empathetic mode, and can thus generate appropriate reactions, with one pathway for "being angry" and a different pathway for "knowing that someone else is angry".

So let's now return to my cookie puzzle above. Alice gets two cookies and I only get one. How can I feel it's unfair, given that the neocortex doesn't have a built-in notion of unfairness, and the amygdala doesn't know what cookies are? The answer would be: thanks to subcortex-supervised learning, the amygdala gets a message that one yummy cookie is coming, but the neocortex also thinks "Alice is even happier", and that thought also recruits the amygdala, since intuitive psychology is built on empathetic modeling. Now the amygdala knows that I'm gonna get something good, but that Alice is gonna get something even better, and that combination (in the current emotional context) triggers the amygdala to send out waves of jealousy and indignation. This is then a new supervisory signal for the neocortex, which allows the neocortex to gradually develop a model of fairness, which in turn feeds back into the intuitive psychology module, and thereby back to the amygdala, allowing the amygdala to execute more complicated innate emotional responses in the future, and so on.

(Update: See my later post Inner alignment in the brain for a slightly more fleshed-out discussion of this mechanism.)

The special case of language.

It's tempting to put language in the category of memes (mechanism 4 above)—we do generally learn language from each other—but it's not really, because apparently groups of kids can invent grammatical languages from scratch (e.g. Nicaraguan Sign Language). My current guess is that it combines three things: (1) a two-process mechanism (Mechanism 5 above) that makes people highly attentive to human speech sounds. (2) possibly "hyperparameter tuning" in the language-learning areas of the cortex, e.g. maybe to support taller compositional hierarchies than would be required elsewhere in the cortex. (3) The fact that language can sculpt itself to the common cortical algorithm rather than the other way around—i.e., maybe "grammatical language" is just another word for "a language that conforms to the types of representations and data structures that are natively supported by the common cortical algorithm".

By the way, lots of people (including Steven Pinker) seem to argue that language processing is a fundamentally different and harder task than, say, visual processing, because language requires symbolic representations, composition, recursion, etc. I don't understand this argument; I think vision processing needs the exact same things! I don't see a fundamental difference between the visual-processing system knowing that "this sheet of paper is part of my notebook", and the grammatical "this prepositional phrase is part of this noun phrase". Likewise, I don't see a difference between recognizing a background object interrupted by a foreground occlusion, versus recognizing a noun phrase interrupted by an interjection. It seems to me like a similar set of problems and solutions, which again strengthens my belief in CCA theory.

Conclusion

When I initially read about CCA theory, I didn't take it too seriously because I didn't see how instincts could be compatible with it. But I now find it pretty likely that there's no fundamental incompatibility. So having removed that obstacle, and also read the literature a bit more, I'm much more inclined to believe that CCA theory is fundamentally correct.

Again, I'm learning as I go, and in some cases making things up as I go along. Please share any thoughts and pointers!

I'll be talking a lot about the neocortex in this article, but shout-out to the thalamus and hippocampus, the other two primary parts of the brain's predictive-world-model-building-system. I'm just leaving them out for simplicity; this doesn't have any important implications for this article. ↩︎
More examples of region-to-region variation in the neocortex that are (plausibly) genetically-coded: (1) Spindle neurons only exist in a couple specific parts of the neocortex. I don't really know what's the deal with those. Kurzweil claims they're important for social emotions and empathy, if I recall correctly. Hmmm. (2) "Sensitive windows" (see Dehaene): Low-level sensory processing areas more-or-less lock themselves down to prevent further learning very early in life, and certain language-processing areas lock themselves down somewhat later, and high-level conceptual areas don't ever lock themselves down at all (at least, not as completely). I bet that's genetically hardwired. I guess psychedelics can undermine this lock-down mechanism? ↩︎
I have heard that the primary motor cortex is not the only part of the neocortex that emits motor commands, but don't know the details. ↩︎
Also, people who lose various parts of the neocortex are often capable of full recovery, if it happens early enough in infancy, which suggests to me that the CCA's wiring-via-learning capability is doing most of the work, and maybe the innate wiring diagram is mostly just getting things set up more quickly and reliably. ↩︎
See Rethinking Innateness p116, or better yet Johnson's article ↩︎
See, for example, Fast Detector/First Responder: Interactions between the Superior Colliculus-Pulvinar Pathway and Stimuli Relevant to Primates. Also, let us pause and reflect on the fact that humans have two different visual processing systems! Pretty cool! The most famous consequence is blindsight, a condition where the ~~subconscious~~ midbrain vision processing system (superior colliculus) is intact but the ~~conscious~~ neocortical visual processing system is not working. This study proves that blindsighted people can recognize not just faces but specific facial expressions. I strongly suspect blindsighted people would react to snakes and spiders too, but can't find any good studies (that study in the previous sentence regrettably used stationary pictures of spiders and snakes, not videos of them scampering and slithering). ↩︎

[-]Charlie Steiner6y30

This was just on my front page for me, for some reason. So, it occurs to me that the example of the evolved FPGA is precisely the nightmare scenario for the CCA hypothesis.

If neurons behave according to simple rules during growth and development, and there are only smooth modulations of chemical signals during development, then nevertheless you might get regions of the cortex that look very similar, but whose cells are exploiting the hardly-noticeable FPGA-style quirks of physics in different ways. You'd have to detect the difference by luckily choosing the right sort of computational property to measure.

[-]Steven Byrnes6y*20

Thanks for the comment! When I think about it now (8 months later), I have three reasons for continuing to think CCA is broadly right:

Cytoarchitectural (quasi-) uniformity. I agree that this doesn't definitively prove anything by itself, but it's highly suggestive. If different parts of the cortex were doing systematically very different computations, well maybe they would start out looking similar when the differentiation first started to arise millions of years ago, but over evolutionary time you would expect them to gradually diverge into superficially-obviously-different endpoints that are more appropriate to their different functions.
Narrowness of the target, sorta. Let's say there's a module that takes specific categories of inputs (feedforward, feedback, reward, prediction-error flags) and has certain types of outputs, and it systematically learns to predict the feedforward input and control the outputs according to generative models following this kind of selection criterion (or something like that). This is a very specific and very useful thing. Whatever the reward signal is, this module will construct a theory about what causes that reward signal and make plans to increase it. And this kind of module automatically tiles—you can connect multiple modules and they'll be able to work together to build more complex composite generative models integrating more inputs to make better reward predictions and better plans. I feel like you can't just shove some other computation into this system and have it work—it's either part of this coordinated prediction-and-action mechanism, or not (in which case the coordination prediction-and-action mechanism will learn to predict it and/or control it, just like it does for the motor plant etc.). Anyway, it's possible that some part of the neocortex is doing a different sort of computation, and not part of the prediction-and-action mechanism. But if so, I would just shrug and say "maybe it's technically part of the neocortex, but when I say "neocortex", I'm using the term loosely and excluding that particular part." After all, I am not an anatomical purist; I am already including part of the thalamus when I say "neocortex" for example (I have a footnote in the article apologizing for that). Sorry if this description is a bit incoherent, I need to think about how to articulate this better.
Although it's probably just the Dunning-Kruger talking, I do think I at least vaguely understand what the algorithm is doing and how it works, and I feel like I can concretely see how it explains everything about human intelligence including causality, counterfactuals, hierarchical planning, task-switching, deliberation, analogies, concepts, etc. etc.

AI ALIGNMENT FORUM
AF