[Intro to brain-like-AGI safety] 13. Symbol grounding & human social instincts

[-]azsantosk4y50

One thing that appears to be missing on the filial imprinting story is a mechanism allowing the "mommy" thought assessor to improve or at least not degrade over time.

The critical window is quite short, so many characteristics of mommy that may be very useful will not be perceived by the thought assessor in time. I would expect that after it recognizes something as mommy it is still malleable to learn more about what properties mommy has.

For example, after it recognizes mommy based on the vision, it may learn more about what sounds mommy makes, and what smell mommy has. Because these sounds/smalls are present when the vision-based mommy signal is present, the thought assessor should update to recognize sound/smell as indicative of mommy as well. This will help the duckling avoid mistaking some other ducks for mommy, and also help the ducklings find their mommy though other non-visual cues (even if the visual cues are what triggers the imprinting to begin with).

I suspect such a mechanism will be present even after the critical period is over. For example, humans sometimes feel emotionally attracted to objects that remind them or have become associated with loved ones. The attachment may be really strong (e.g. when the loved one is dead and only the object is left).

Also, your loved ones change over time, but you keep loving them! In "parental" imprinting for example, the initial imprinting is on the baby-like figure, generating a "my kid" thought assessor associated with the baby-like cues, but these need to change over time as the baby grows. So the "my kid" thought assessor has to continuously learn new properties.

Even more importantly, the learning subsystem is constantly changing, maybe even more than the external cues. If the learned representations change over time as the agent learns, the thought assessors have to keep up and do the same, otherwise their accuracy will slowly degrade over time.

This last part seems quite important for a rapidly learning/improving AGI, as we want the prosocial assessors to be robust to ontological drift. So we both want the AGI to do the initial "symbol-grounding" of desirable proto-traits close to kindness/submissiveness, and also for its steering subsystem to learn more about these concepts over time, so that they "converge" to favoring sensible concepts in an ontologically advanced world-model.

[-]Steven Byrnes4y*30

Thanks!

For example, humans…

Just to be clear, I was speculating in that section about filial imprinting in geese, not familial bonding in humans. I presume that those two things are different in lots of important ways. In fact, for all I know, they might have nothing whatsoever in common. ¯\_(ツ)_/¯

(UPDATE: I guess the Westermarck Effect might be implemented in a Section-13.3-like way, although not necessarily.)

If the learned representations change over time as the agent learns, the thought assessors have to keep up and do the same, otherwise their accuracy will slowly degrade over time.

Yeah, that seems possible (although I also consider it possible that it’s not a problem; by analogy, catastrophic forgetting is famously more of an issue for ANNs than for brains).

If the learned representations do in fact change a lot over time, I’m slightly skeptical that it would be possible to solve that problem directly, thanks to the lack of an independent ground truth. For example, I can imagine a system that says “If I’m >95% confident that this is MOMMY, then update such that I’m 100% confident that this is MOMMY.” Maybe that system would work to keep pointing at the real mommy, even as learned representations drift. But also, maybe that system would cause the Thought Assessor to gradually go off the rails and trigger off weird patterns in noise. Not sure. Did you have something like that in mind? Or something different?

An alternative might be that, if the specific filial-imprinting mechanism gradually stops working over time, it deactivates at some point and the (now-adolescent) goose switches to some other mechanism(s), like “desire to be with fellow geese that are extremely familiar to me” a la Section 13.4.

Reminder that I know very little about goose behavior and this is all casual speculation. :)

[-]Nathan Helm-Burger3y10

Ok, so this is definitely not a human thing, so probably a bit of a tangent. One of the topics that came up in a neuroscience class once was goose imprinting. There's apparently been studies (see Eckhard Hess for the early ones) that show that the strength of the imprinting (measured by behavior following the close of the critical period) onto whatever target is related to how much running towards the target the baby geese do. The hand-wavey explanation was something like 'probably this makes sense since if you have to run a lot to keep up with your mother-goose for safety, you'll need a strong mother-goose-following behavioral tendency to keep you safe through early development'.

https://www.apa.org/monitor/2011/12/imprinting

[-]Ben Smith3y10

Still working my way through reading this series--it is the best thing I have read in quite a while and I'm very grateful you wrote it!

I feel like I agree with your take on "little glimpses of empathy" 100%.

I think fear of strangers could be implemented without a steering subsystem circuit maybe? (Should say up front I don't know more about developmental psychology/neuroscience than you do, but here's my 2c anyway). Put aside whether there's another more basic steering subsystem circuit for agency detection; we know that pretty early on, through some combination of instinct and learning from scratch, young humans and many animals learn there are agents in the world who move in ways that don't conform to the simple rules of physics they are learning. These agents seem to have internally driven and unpredictable behavior, in the sense their movement can't be predicted by simple rules like "objects tend to move to the ground unless something stops them" or "objects continue to maintain their momentum". It seems like a young human could learn an awful lot of that from scratch, and even develop (in their thought generator) a concept of an agent.

Because of their unpredictability, agent concepts in the thought generator would be linked to thought assessor systems related to both reward and fear; not necessarily from prior learning derived from specific rewarding and fearful experiences, but simply because, as their behavior can't be predicted with intuitive physics, there remains a very wide prior on what will happen when an agent is present.

In that sense, when a neocortex is first formed, most things in the world are unpredictable to it, and an optimally tuned thought generator+assessor would keep circuits active for both reward or harm. Over time, as the thought generator learns folk physics, most physical objects can be predicted, and it typically generates thoughts in line with their actual beahavior. But agents are a real wildcard: their behavior can't be predicted by folk physics, and so they perceived in a way that every other object in the world used to be: unpredictable, and thus continually predicting both reward and harm in an opponent process that leads to an ambivalent and uneasy neutral. This story predicts that individual differences in reward and threat sensitivity would particularly govern the default reward/threat balance otherwise unknown items. It might (I'm really REALLY reaching here) help to explain why attachment styles seem so fundamentally tied to basic reward and threat sensitivity.

As the thought generator forms more concepts about agents, it might even learn that agents can be classified with remarkable predictive power into "friend" or "foe" categories, or perhaps "mommy/carer" and "predator" categories. As a consequence of how rocks behave (with complete indifference towards small children), it's not so easy to predict behavior of, say, falling rocks with "friend" or "foe" categories. On the contrary, agents around a child are often not indifferent to children, making it simple for the child to predict whether favorable things will happen around any particular agent by classifying agents into "carer" or "predator" categories. These categories can be entirely learned; clusters of neurons in the thought generator that connect to reward and threat systems in the steering system and/or thought assessor. So then the primary task of learning to predict agents is simply whether good things or bad things happen around the agent, as judged by the steering system.

This story would also predict that, before the predictive power of categorizing agents into "friend" vs. "foe" categories has been learned, children wouldn't know to place agents into these categories. They'd take longer to learn whether an agent is trustworthy or not, particularly so if they haven't learned what an agent is yet. As they grow older, they get more comfortable with classifying agents into "friend" or "foe" categories and would need fewer exemplars to learn to trust (or distrust!) a particular agent.

[-]jpyykko4y00

I feel that the social instincts link to the learned-from-scratch world-model via a chain of guided development windows.
The singular links in the chain are stacks of affective mechanisms: the trigger that detects the environmental stimulus (the moving large object for ducklings), the response (follow that object), and an affect (emotion) that links the instinct to the learned model via a reward signal to strengthen the association (feeling of safety).
As it would be near impossible for the DNA to have a concept of "Rita won a trophy" as the trigger, the system would have to first "teach" the model simpler concepts, and then tag onto those via the affect to be able to trigger later correctly: for example, "Rita" would be identified as a "member of the pack/competition", which would be derived from the concept of "agent". This in turn would have to be first learned via the associations that spring from the early instincts of "pheromones", "human voice", "eyes" etc..

These simpler concepts from our early years occur in development windows. F.ex. for the first 8 weeks babies don't focus their gaze on anything, as they are still learning the basics of seeing. After they have a slight better capacity to predict what they see, the next development window opens, which among other things, has a filter to detect eyes. For a while the eyes are associated with an "agent" and "safety", hence the babies smile instantly at their parents faces, while pretty soon this filial imprinting window closes, and they start to cry at the sight of new faces instead.

I have some of these chains of instincts mapped out on an initial level, and am soon trying out these theories within an environment closely resembling to OpenAI's gym (the architecture didn't lend itself easily to this new reward paradigm, unfortunately). Maybe they could be discussed further with some interested people?

Also, little glimpse of empathy has some literature under the term mirror neurons.

[-]Steven Byrnes4y*40

little glimpse of empathy has some literature under the term mirror neurons

Sorta, but unfortunately the "mirror neuron" literature seems to be a giant dumpster fire. I suggest & endorse the book The Myth Of Mirror Neurons by Hickok. UPDATE: See also my later post Quick notes on "mirror neurons".

^{^}

In the initial version of this post, my opening example was “envy” instead of “drive to feel liked / admired”. But now I think maybe envy was a very bad example.

Instead, my leading hypothesis right now is that envy is a side-effect of innate drives / reactions that are not specifically social at all! An alternative is that envy is sorta just a special case of craving—a kind of anxious frustration in a scenario where something is highly motivating and salient, but there’s no way to actualize that desire.

So if Sally has a juice box and I don’t, it incidentally makes the alluring possibility of drinking juice very salient in my mind. Since I can’t have juice, the frustration of that desire leads (via some innate mechanism I don’t understand) to feelings that I’d call “envy”. But if I’m staring at an empty shelf in the store where there should have been juice boxes (but the juice box factory burned down), I think I can get the very same kind of frustrated feeling, for the same underlying reason. But in the latter case, I wouldn’t call it “envy”, because it’s not directed towards anyone in particular. The factory burned down—nobody has a juice box this week! But it’s still frustrating to look at the empty shelf, taunting me.

I could be wrong.

^{^}

See Elephant In The Brain (Simler & Hanson, 2018) for a brief discussion of Arabian Babbler Birds. Warning: I know very little about Arabian Babbler Birds, and indeed for all I know their apparent prestige-seeking might occur for different underlying reasons than humans’.

^{^}

For a good recent review of the role of the Steering Subsystem (especially hypothalamus) in social behavior, see “Hypothalamic Control of Innate Social Behaviors” (Mei et al., 2023). Another lovely recent example, related to loneliness, is “A Hypothalamic Circuit Underlying the Dynamic Control of Social Homeostasis” (Liu et al., 2023).

^{^}

“…if you look at the human literature nobody talks about the hypothalamus and behaviour. The hypothalamus is very small and can’t be readily seen by human brain imaging technologies like functional magnetic resonance imaging (fMRI). Also, much of the anatomical work in the instinctive fear system, for example, has been overlooked because it was carried out by Brazilian neuroscientists who were not particularly bothered to publish in high profile journals. Fortunately, there has recently been a renewed interest in these behaviors and these studies are being newly appreciated.” (Cornelius Gross, 2018)

^{^}

I suspect a more accurate diagram would feature arousal (in the psychology-jargon sense, not the sexual sense—i.e., heart rate elevation etc.) as a mediating variable. Specifically: (1) if brainstem sensory processing indicates that an adult human is present and nearby and picking me up etc., that leads to heightened arousal (by default, unless the Thought Assessors strongly indicate otherwise), and (2) when I’m in a state of heightened arousal, my brainstem treats it as bad and dangerous (by default, unless the Thought Assessors strongly indicate otherwise).

^{^}

For example, the Steering Subsystem needs a method to distinguish a “transient empathetic simulation” from other transient feelings, e.g. the transient feeling that occurs when I think through the consequences of a possible course of action that I might take. Maybe there are some imperfect heuristics that could do that, but my preferred theory is that there’s a special Thought Assessor trained to fire when attending to another human (based on ground-truth sensory heuristics as discussed in Section 13.4). As another example, we need the “Ground truth in hindsight” signals to not gradually train away the Thought Assessor’s sensitivity to “his stomach is getting punched”. But it seems to me that, if the Steering Subsystem can figure out when a signal is a “transient empathetic simulation”, then it can choose not to send error signals to the Thought Assessors in those cases.

^{^}

Warning: I’m not entirely sure that there really is a “standard” definition of empathy; it’s also possible that the term is used in lots of slightly-inconsistent ways. I enjoyed this blog post on the topic.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

24

[Intro to brain-like-AGI safety] 13. Symbol grounding & human social instincts

24

13.1 Post summary / Table of contents

13.2 What are we trying to explain and why is it tricky?

13.2.1 Claim 1: Social instincts arise from genetically-hardcoded circuitry in the Steering Subsystem (hypothalamus & brainstem)

13.2.2 Claim 2: Social instincts are tricky because of the “symbol grounding problem”

13.2.3 Reminder of brain model, from previous posts

13.3 Sketch #1: Filial imprinting

13.3.1 Overview

13.3.2 How is the MOMMY Thought Assessor trained?

13.3.3 How is the MOMMY Thought Assessor used?

13.4 Sketch #2: Fear of strangers

13.5 Another key ingredient (I think): “Transient empathetic simulation”

13.5.1 Introduction

13.5.2 Distinction from the standard definition of “empathy”

13.5.3 Why do I believe that “transient empathetic simulations” are part of the story?

13.6 Future work (please!)

Changelog