Towards a Less Bullshit Model of Semantics

David Lorell

Notice that in our picture so far, the output of Alice’s semantics-box consists of values of some random variables in Alice’s model, and the output of Bob’s semantics-box consists of values of some random variables in Bob’s model. With that picture in mind, it’s unclear what it would even mean for Alice and Bob to “agree” on the semantics of sentences. For instance, imagine that Alice and Bob are both Solomonoff inductors with a special module for natural language. They both find some shortest program to model the world, but the programs they find may not be exactly identical; maybe Alice and Bob are running slightly different Turing machines, so their shortest programs have somewhat different functions and variables internally. Their semantics-boxes then output values of variables in those programs. If those are totally different programs, what does it even mean for Alice and Bob to “agree” on the values of variables in these totally different programs?

This importantly understates the problem. (You did say "for instance" -- I don't think you are necessarily ignoring the following point, but I think it is a point worth making.)

Even if Alice and Bob share the same universal prior, Solomonoff induction comes up with agent-centric models of the world, because it is trying to predict perceptions. Alice and Bob may live in the same world, but they will perceive different things. Even if they stay in the same room and look at the same objects, they will see different angles.

If we're lucky, Alice and Bob will both land on two-part representations which (1) model the world from a 3rd person perspective, and (2) then identify the specific agent whose perceptions are being predicted, providing a 'phenomonological bridge' to translate the 3rd-person view of reality into a 1st person view. Then we're left with the problem which you mention: Alice and Bob could have slightly different 3rd-person understandings of the universe.

If we could get there, great. However, I think we imagine Solomonoff induction arriving at such a two-part model largely because we think it is smart, and we think smart people understand the world in terms of physics and other 3rd-person-valid concepts. We think the physicalist/objective conception of the world is true, and therefore, Solomonoff induction will figure out that it is the best way.

Maybe so. But it seems pretty plausible that a major reason why humans arrive at these 'objective' 3rd-person world-models is because humans have a strong incentive to think about the world in ways that make communication possible. We come up with 3rd-person descriptions of the words because they are incredibly useful for communicating. Solomonoff induction is not particularly designed to respect this incentive, so it seems plausible that it could arrange its ontology in an entirely 1st-person manner instead.

[-]johnswentworth1y20

But it seems pretty plausible that a major reason why humans arrive at these 'objective' 3rd-person world-models is because humans have a strong incentive to think about the world in ways that make communication possible.

This is an interesting point which I had not thought about before, thank you. Insofar as I have a response already, it's basically the same as this thread: it seems like understanding of interoperable concepts falls upstream of understanding non-interoperable concepts on the tech tree, and also there's nontrivial probability that non-interoperable concepts just aren't used much even by Solomonoff inductors (in a realistic environment).

[-]abramdemski1y50

Ah, don't get me wrong: I agree that understanding interoperability is the thing to focus on. Indeed, I think perhaps "understanding" itself has something to do with interoperability.

The difference, I think, is that in my view the whole game of interoperability has to do with translating between 1st person and 3rd person perspectives.

Your magic box takes utterances and turns them into interoperable mental content.

My magic box takes non-interoperable-by-default^[1] mental content and turns them into interoperable utterances.

The language is the interoperable thing. The nature of the interoperable thing is that it has been optimized so as to easily translate between many not-so-easily-interoperable (1st person, subjective, idiosyncratic) perspectives.

^{^}
"Default" is the wrong concept here, since we are raised from little babies to be highly interoperable, and would die without society. What I mean here is something like, it is relatively easy to spell out non-interoperable theories of learning / mental content, EG solomonoff's theory, or neural nets.

[-]abramdemski1y62

Takeaway: there can’t be that many possible semantic targets for words. The set of semantic targets for words (in humans) is at least exponentially smaller than the set of random variables in an agent’s world model.

I don't think this follows. The set of semantic targets could be immense, but children and adults could share sufficiently similar priors, such that children land on adequately similar concepts to those that adults are trying to communicate with very little data.

Think of it like a modified Schelling-point game, where some communication is possible, but sending information is expensive. Alice is trying to find Bob in the galaxy, and Bob has been able to communicate only a little information for Alice to go on. However, Alice and Bob are both from Earth, so they share a lot of context. Bob can say "the moon" and Alice knows which moon Bob is probably talking about, and also knows that there is only one habitable moon-base on the moon to check.

Bob could find a way to point Alice to any point in the galaxy, but Bob probably won't need to. So the set of possibilities appears to be small, from the perspective of someone who only sees a few rounds of this game.

[-]johnswentworth1y34

So really, rather than "the set of semantic targets is small", I should say something like "the set of semantic targets with significant prior probability is small", or something like that. Unclear exactly what the right operationalization is there, but I think I buy the basic point.

[-]abramdemski1y50

That’s the main problem of interest to us, for purposes of this post: what’s the set of possible semantic targets for a word?

From the way you've defined things so far, it seems relatively clear what it would mean to solve this problem for sentences; translating from "X" to X has been operationalized as what you condition on if you take "X" literally.

However, the jump you are making to the meaning of a word seems surprising and unclear. If Carol shouts "Ball!" it is unclear what it would mean to condition on the literal content; it seems to be all pragmatics. Since Carol didn't bother to form a valid sentence, she is not making a claim which can be true or false. It could mean "there is a ball coming at your head" or it could mean "We forgot the basketball at the court" or any number of other things, depending on context.

So, while it does indeed seem meaningful to talk about the semantics of words, the picture you have drawn so far of the "magic box" does not seem to fit the case of individual words. We do not condition on the literal meaning of individual words; those meanings have the wrong type signature to condition on.

[-]abramdemski1y52

We can already give a partial answer: because we’re working in a Bayesian frame, the outputs of the semantics box need to be assignments of values to random variables in the world model, like

Why random variables, rather than events? In terms of your sketched formalism so far, it seems like events are the obvious choice -- events are the sort of thing we can condition on. Assigning a random variable to a value is just an indirect way to point out an event; and, this indirect method creates a lot of redundancy, since there are many many assignments-of-random-variables-to-values which would point out the same event.

[-]johnswentworth1y20

First: if the random variables include latents which extend some distribution, then values of those latents are not necessarily representable as events over the underlying distribution. Events are less general. (Related: updates allowed under radical probabilism can be represented by assignments of values to latents.)

Second: I want formulations which feel like they track what's actually going on in my head (or other peoples' heads) relatively well. Insofar as a Bayesian model makes sense for the stuff going on in my head at all, it feels like there's a whole structure of latent variables, and semantics involves assignments of values to those variables. Events don't seem to match my mental structure as well. (See How We Picture Bayesian Agents for the picture in my head here.)

[-]abramdemski1y40

The two perspectives are easily interchangeable, so I don't think this is a big disagreement. But the argument about extending a distribution seems... awful? I could just as well say that I can extend my event algebra to include some new events which cannot be represented as values of random variables over the original event algebra, "so random variables are less general".

[-]abramdemski1y40

The second central problem of Interoperable Semantics is to account for Alice and Bob’s agreement. In the Bayesian frame, this means that we should be able to establish some kind of (approximate) equivalence between at least some of the variables in the two agents’ world models, and the outputs of the magic semantics box should only involve those variables for which we can establish equivalence.

To me, this seems like a strange way to go about it, if your hope is to address AI safety concerns. If Alice is trying to understand Bob, and Alice sees that Bob uses a weird blob of incomprehensible gibberish as a key step in his reasoning, then Alice should think she has failed, rather than thinking she should ignore that part.

In some sense, agents come equipped with a 1st-person perspective (a set of cognitive tools which is useful for predicting their own sense-data and managing their own actions), and the challenge we face is one of translating that 1st-person perspective to a 3rd-person perspective (an interoperable language which can readily be translated into many different 1st person perspectives, ie, understood by many different agents).

[-]johnswentworth1y20

That particular paragraph was intended to be about two humans. The application to AI safety is less direct than "take Alice to be a human, and Bob to be an AI" or something like that.

[-]abramdemski1y40

That makes sense. But, effectively, you are deferring the question of how it relates to AI safety. If I have my intuition (roughly, that the most important part of the problem is how to understand alien concepts which AIs might have) and you have your intuition (roughly, that the most important part of the problem is how to understand human concepts) then presumably we can try and articulate some reasons.

I've said something about why I think it seems important not to give up on mental content that seems hard to translate. Perhaps you could say a bit more about why you are interested in a thingy that only looks for easily translatable content and ignores hard-to-translate content?

[-]johnswentworth1y20

I definitely have substantial probability on the possibility that AIs will use a bunch of alien (i.e. non-interoperable or hard-to-interoperate) concepts. And in worlds where that's true, I largely agree that those are the most important (i.e. hardest/rate-limiting) part of the technical problems of AI safety.

That said:

I have substantial probability that AIs basically don't use a bunch of non-interoperable concepts (or converge to more interoperable concepts as capabilities grow, or ...). In those worlds, I expect that "how to understand human concepts" is the rate-limiting part of the problem.
Even in worlds where AIs do use lots of alien concepts, it feels like understanding human concepts is "earlier on the tech tree" than figuring out what to do with those alien concepts. Like, it is a hell of a lot easier to understand those alien concepts by first understanding human concepts and then building on that understanding, than by trying to jump straight to alien concepts.

[-]abramdemski1y40

What would constitute "understanding human concepts" in the relevant sense?

In another comment, I suggested that human concepts can be represented in human language. This might miss out on some important human mental content, but it would not miss out on anything that the magic box spits out, since the magic box is specifically dealing with language.

This trivializes the magic box; it becomes the identity function, or at best, a paraphrasing function. But what, exactly, is wrong with such a trivial understanding of the magic box? Where does it fall short of the sort of understanding you seek to achieve?

It frames things in terms of events (each event labeled with a natural-language sentence) rather than random variables, like you want, but I can trivially reframe it in terms of random variables by considering the truth value of the sentences as 0,1 instead of true,false.

Yes, I intuitively feel that this is a dumb trivial proposal that contributes nothing to our understanding of concepts. But, I quote:

At this point, we’re not even necessarily looking for “the right” class of random variables, just any class which satisfies the above criteria and seems approximately plausible.

[-]johnswentworth1y32

One example: you know that thing where I point at a cow and say "cow", and then the toddler next to me points at another cow and is like "cow?", and I nod and smile? That's the thing we want to understand. How the heck does the toddler manage to correctly point at a second cow, on their first try, with only one example of me saying "cow"? (Note that same question still applies if they take a few tries, or have heard me use the word a few times.)

The post basically says that the toddler does a bunch of unsupervised structure learning, and then has a relatively small set of candidate targets, so when they hear the word once they can assign the word to the appropriate structure. And then we're interested in questions like "what are those structures?", and interoperability helps narrow down the possibilities for what those structures could be.

... and I don't think I've yet fully articulated the general version of the problem here, but the cow example is at least one case where "just take the magic box to be the identity function" fails to answer our question.

[-]abramdemski1y40

Since we’re in a Bayesian frame, any semantic targets should be assignments of values (e.g. ) to random variables in an agent’s model. (Note that this includes functions of random variables in the agent’s model, and data structures of random variables in the agent’s model.)
… but the set of possible semantic targets for words must be exponentially smaller than the full set of possible assignments of values to random variables.

I commented about why I disagree with the first bullet point here, and the second bullet point here.

[-]abramdemski1y40

A shorter-term intermediate question is: what even is the input set (i.e. domain) and output set (i.e. range) of the semantics box? Its inputs are natural language, but what about the outputs?

Because you are making the assumption that the important semantic content is inter-operable, and you're assuming this interoperable content is mediated entirely through language (Carol doesn't get to EG demonstrate how to tie shoelaces visually), It seems like Alice should be able to tell people what she understood Carol to mean.

In other words, it seems like your framework implies that you can use language itself as the representation without losing anything. Yes, an utterance will have many equivalent paraphrasings; IE the magic box is not a 1-1 function. It can be very lossy. However, the magic box should not add information. So the semantic content Y can be represented by one of the utterances X which would land on it.

If the magic box does add information (EG, if Carol says 'apple', Alice always imagines a specific color of apple, and does so randomly so that information is really added in an infotheoretic sense) ... well, I suppose that can happen, but we've violated the assumption that the magic box is a function, and also I think something has gone wrong in terms of Alice trying to understand Carol (Alice should understand that the apple could be any color).

[-]johnswentworth1y20

I don't think this is quite right? Most of the complexity of the box is supposed to be learned in an unsupervised way from non-language data (like e.g. visual data). If someone hasn't already done all that unsupervised learning, then they don't "know what's in the box", so they don't know how to extract semantics from words.

[-]abramdemski1y20

I don't disagree with this point. I don't see how it undermines the idea that all of the semantic content of language can be represented via language. (I'm not sure what you understood me to be saying, such that this objection of yours felt relevant.)

I'm not claiming that our mental representations of semantic content "are" linguistic, or that they "come from" language. I'm just saying that we can use language to represent them.

Importantly, it is also possible that there are forms of mental content which are very difficult or even impossible to communicate with language alone, like perhaps thoughts about knot-tying. I am only claiming that the output of the magic box described here can necessarily be represented linguistically.

[-]abramdemski1y40

The central challenge of ML interpretability is to faithfully and robustly translate the internal concepts of neural nets into human concepts (or vice versa). But today, we don’t have a precise understanding of what “human concepts” are. Semantics gives us an angle on that question: it’s centrally about what kind of mental content (i.e. concepts) can be interoperable (i.e. translatable) across minds.

It seems to me like there's an important omission here: we also don't understand what we really want to point at whet we say "the internal concepts of neural nets".

One might say that understanding "human concepts" is more the central difficulty here, because the human concepts are what we're trying to translate into.

However, we also need to understand what we're translating out of. For example, we might find a translation from NN activations to human concepts which is highly satisfying by some metric, but, which fails to uncover deceptive cognition within the NN. One idea for how to avoid this: ignoring content which we do not know how to translate into human concepts needs to count as a failure, rather than a success. Notice how this requires a notion of 'content' which we are trying to translate.

We can perhaps understand this as a 'strategy-stealing' requirement: to fully understand the content of an NN means to be able to replicate all of its capabilities using the translated content (importantly, including hidden capabilities which we don't see on our test data).

[-]abramdemski1y*40

In this post, we’ll ignore Gricean implicature; our agents just take everything literally. Justification for ignoring it: first, the cluster-based model in this post is nowhere near the level of sophistication where lack of Gricean implicature is the biggest problem. Second, when it does come time to handle Gricean implicature, we do not expect that the high-level framework used here - i.e. Bayesian agents, isomorphism between latents - will have any fundamental trouble with it.

A naive reader may think that "ignoring Gricean implicature" means pretending that it doesn't exist; to be more precise: pretending that the semantics and pragmatics of an utterance are equal.

(I will use 'pragmatics' to mean all implications a listener can draw from an utterance, including Gricean implicature, and 'semantics' to mean only the literal implications. For example, if I say "you left the door open" then (depending on context) I probably am implying that you should close it; this is pragmatics and gricean implicature, but is not a literal implication of what I said. This is also a near-synonym of a connotation/denotation distinction, where connotationpragmatics, denotation $\approx$ semantics.)

However, the way you frame the problem actually critically relies on a semantics/pragmatics distinction. You define "the magic box" to be what translates from the utterance to what you would condition on if you took the sentence literally: the difference between $C a r o l S a y s (‘ ‘ X ")$ vs $X$ .

IE, when Alice hears Carol say something, she conditions on the full sensory experience, and reaches the full range of pragmatic conclusions: $P_{A l i c e} (. | C a r o l S a y s (‘ ‘ X "))$ . But, for the purpose of sussing out semantics, what you want to do in the post is pretend that Alice takes Carol literally, and conditions only on the semantic content of what Carol says: $P_{A l i c e} (. | X)$ .

Hence, the magic box is a function relating pragmatics to semantics; it takes the event which we would condition on to get the pragmatics (namely: the full sensory experience) and maps it to the semantics (the literal meaning of what was said).

[-]abramdemski1y52

I suppose I didn't draw out the critical implication I'm trying to point to:

If you buy my argument that, far from ignoring semantics vs pragmatics, your way of framing the problem relies critically on the distinction...

...then you should be more curious about what is going on with the distinction, rather than writing it off as a less important detail to be figured out later.

I take pragmatics to be easy to understand (so long as we take it to include semantics, rather than be exclusive): the pragmatics of an utterance is just what a Bayesian listener would infer from it. (We can, if we like, also point to the pragmatic intent: what the speaker was trying to get the listener to infer.)

What seems hard is, how do we point out only the semantic content, when in conversation we always need to think about the full pragmatics?

Why do we even believe that utterances have literal content, rather than only a cloud of probabilistic implications? How could such a belief be grounded in linguistic behavior, aside from the brute fact that people talk about this distinction as if it is a thing? What singles out some inferences as semantic? What makes those inferences different from other pragmatic inferences?

It seems like it has something to do with always-valid inferences vs context-sensitive inferences, for one thing.

^{^}

In this post, we’ll ignore Gricean implicature; our agents just take everything literally. Justification for ignoring it: first, the cluster-based model in this post is nowhere near the level of sophistication where lack of Gricean implicature is the biggest problem. Second, when it does come time to handle Gricean implicature, we do not expect that the high-level framework used here - i.e. Bayesian agents, isomorphism between latents - will have any fundamental trouble with it.

^{^}

When we say “word” or “short phrase”, what we really mean is “atom of natural language.”

^{^}

A full characterization of interoperable mental content / semantics requires specifying the possible mappings of larger constructions, like sentences, into interoperable mental content, not just words. But once we characterize the mental content which individual words can map to (i.e. their ‘semantic targets’,) we are hopeful that the mental content mapped to by larger constructions (e.g. sentences,) will usually be straightforwardly constructable from those smaller pieces. So if we can characterize “what children can attach words to”, then we’d probably be most of the way to characterizing the whole range of outputs of the magic semantics box.

Notably, going from words to sentences and larger constructs is the focus of the existing academic field of “semantics”. What linguists call “semantics” is mostly focused on constructing semantics of sentences and larger constructs from the semantics of individual words (“atoms”). From their standpoint, this post is mostly about characterizing the set of semantic values of atoms, assuming Bayesian agents.

^{^}

For those who read Natural Latents: The Math before this post, note that we added an addendum shortly before this post went up. It contains a minor-but-load-bearing step for establishing approximate isomorphism between two agents’ natural latents.

^{^}

Sepal length, sepal width, petal length, and petal width in case you were wondering, presumably collected from a survey of actual flowers last century.

^{^}

Remember that addendum we mentioned in an earlier footnote? The determinism condition is for that part.

Redundancy Error (bits)	Drop (0,)	Drop (1,)	Drop (2,)	Drop (3,)
First run (“Alice”)	0.0211	0.011	0.048	0.089
Second run (“Bob”)	0.034	0.004	0.031	0.177

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

46

Towards a Less Bullshit Model of Semantics

46

But Why Though?

Overview

What’s The Problem?

Central Problem 1: How To Model The Magic Box?

Subproblem: What’s The Range Of Box Outputs?

SubSubProblem: What Can Children Attach Words To?

Summary So Far

Central Problem 2: How Do Alice and Bob “Agree” On Semantics?

Summary: Interoperable Semantics

First (Toy) Model: Clustering + Naturality

Equivalence Via Naturality

A Quick Empirical Check

Strengths and Shortcomings of This Toy Model

Aside: What Does “Grounding In Spacetime Locality” Mean?

Second (Toy) Model Sketch: Rigid Body Objects

The Teacup

Geometry and Trajectory Clusters

Strengths and Shortcomings of This Toy Model

Summary and Call To Action

Call To Action