I structure this post as a critique of some recent papers on the philosophy of mind in application to LLMs, concretely, on whether we can say that LLMs think, reason, understand language, refer to the real world when producing language, have goals and intents, etc. I also use this discussion as a springboard to express some of my views about the ontology of intelligence, agency, and alignment.
Mahowald, Ivanova, et al., “Dissociating language and thought in large language models: a cognitive perspective” (Jan 2023). Note that this is a broad review paper, synthesising findings from computational linguistics, cognitive science, and neuroscience, as well as offering an engineering vision (perspective) of building an AGI (primarily, in section 5). I don’t argue with these aspects of the paper’s content (although I disagree with something about their engineering perspective, I think that engaging in this disagreement would be infohazarous). I argue with the philosophical content of the paper, which is revealed in the language that the authors use and the conclusions that they make, as well as the ontology of linguistic competencies that the authors propose.
Dissociating language and thought in large language models: a cognitive perspective
In this section, I shortly expose the gist of the paper by Mahowald, Ivanova, et al., for the convenience of the reader.
Abstract:
Today’s large language models (LLMs) routinely generate coherent, grammatical and seemingly meaningful paragraphs of text. This achievement has led to speculation that these networks are—or will soon become—“thinking machines”, capable of performing tasks that require abstract knowledge and reasoning. Here, we review the capabilities of LLMs by considering their performance on two different aspects of language use: ‘formal linguistic competence’, which includes knowledge of rules and patterns of a given language, and ’functional linguistic competence’, a host of cognitive abilities required for language understanding and use in the real world. Drawing on evidence from cognitive neuroscience, we show that formal competence in humans relies on specialized language processing mechanisms, whereas functional competence recruits multiple extralinguistic capacities that comprise human thought, such as formal reasoning, world knowledge, situation modeling, and social cognition. In line with this distinction, LLMs show impressive (although imperfect) performance on tasks requiring formal linguistic competence, but fail on many tests requiring functional competence. Based on this evidence, we argue that (1) contemporary LLMs should be taken seriously as models of formal linguistic skills; (2) models that master real-life language use would need to incorporate or develop not only a core language module, but also multiple non-language-specific cognitive capacities required for modeling thought. Overall, a distinction between formal and functional linguistic competence helps clarify the discourse surrounding LLMs’ potential and provides a path toward building models that understand and use language in human-like ways.
Two more characteristic quotes from the paper:
In addition to being competent in the rules and statistical regularities of language, a competent language user must be able to use language to do things in the world: to talk about things that can be seen or felt or heard, to reason about diverse topics, to make requests, to perform speech acts, to cajole, prevaricate, and flatter. In other words, we use language to send and receive information from other perceptual and cognitive systems, such as our senses and our memory, and we deploy words as part of a broader communication framework supported by our sophisticated social skills. A formal language system in isolation is useless to a language user unless it can interface with the rest of perception, cognition, and action.
[…] in examining language models’ functionality, it is important to separate their linguistic abilities from their abstract knowledge and reasoning abilities, which can be probed—and perhaps even learned—through a linguistic interface but require much more than formal linguistic competence.
As evident from the title of the paper (and this section), the authors use the words “thinking” and “thought” as a synonym for functional linguistic competence that they define in the paper.
General critiques
Failure at a task doesn’t imply the absence of capability
Throughout the paper, Mahowald, Ivanova, et al. take evidence of failure (either occasional failure, or nearly universal failure) in specific reasoning and comprehension, and functional tasks as proof that current LLMs don’t have this or that capability.
I think this is a methodological mistake for two reasons.
First, this admits a squarely representationalist philosophy of mind: in order to say that a mind has capacity X, it should have this capacity “actually implemented somehow”, in a regularised way. If a mind commits a mistake in reasoning, it means that the capability in question is not represented, otherwise, such a failure would be impossible.
But this is clearly false for humans: humans make mistakes in all types of capabilities (types of reasoning), from formal linguistic to “functional” (as Mahowald, Ivanova, et al. call them), all the time. Especially when humans are mentally incapacitated, drunk, sleepy, etc. This doesn’t mean that humans don’t have given cognitive capabilities. Therefore, counting mistakes in LLM outputs is a methodological dead-end. Could it be that LLMs are currently in the state that humans call a “cloudy mind”, when thoughts are entangled, and the control of the output is far from perfect? I think this could actually be the case.
From a more foundational perspective, austere representationalism in the last decades clearly loses the battle to enactivism, or at least integrative representationalist-enactivist perspectives (Constant et al. 2021).
Second, from the pragmatic point of view, I think there is little scientific value in finding the capabilities that the latest LLMs don’t have. We are all too familiar with the failure of this approach, evidenced in the last 3 years by Gary Marcus.
Reductionism and the ladder of language model understanding
I think there is a three-level hierarchy of understanding LLMs and their behaviours:
Naive anthropomorphism. “Language models use language like humans, therefore they have human qualities.” This is where Blake Lemoine was in the middle of last year.janus points out in the comments that Lemoine was actually more on the third level.
Reductionism. “LLMs are just mathematical models (matrix multiplication, calculations on GPU, next token prediction, Chinese room, stochastic parrot), which doesn’t understand, doesn’t think, has no meaning, etc.” Unfortunately, this is the level at which a lot of A(G)I researchers are.
Taking the emergence in LLMs seriously. “LLMs could be slightly conscious”, discovering language model behaviours (Perez et al. 2022), establishing AI psychology. Anthropic and OpenAI are definitely on this level. Google and DeepMind seem to never publicly admit their view. I have a hard time believing that they are not on this level, though, so this could be a deliberate public (political) position. (Note: Shanahan is affiliated with DeepMind.)
Both Mahowald, Ivanova, et al. and Shanahan present their positions as reactionary to the takes on the first level of understanding, for example, by the media (especially Shanahan). And although they occasionally pay lip service to the third level, these occasions are greatly outnumbered by categorical reductionist statements that they make throughout their papers, particularly conditioned on the fact that LLMs are trained to “predict the next token in sequence”.
LLMs use language affordances differently than people, even qualitatively differently (e. g., humans have more senses and hence qualitatively different grounding than LLMs could ever have), but this doesn’t mean LLMs don’t use language affordances at all. Both humans and LLMs use language to interact with the world, learn the world better, and align with agents that surround them (more on this below in the post).
The fact that LLMs use language affordances places them in the category of language users, which was previously occupied only by humans and parrots. Crucial here is that the user is an active, not a passive role.
More specific points
Linguistic capability circuits inside LLM-based AI could be sufficient for approximating general intelligence
Mahowald, Ivanova et al. write that “the language network[1] [in human brains] does not support non-linguistic cognition”. This is probably correct but is also irrelevant in the context of considering whether LLM-based AI could reach an AGI level. For humans, too, it’s clear that creating large and hierarchical theories, simulations, and plans, is impossible without language (more specifically, writing and editing), because the humans’ working memory capacity is severely limited. It’s not clear why “thinking step-by-step on steroids” (a-la language model cascades, Dohan et al. 2022), with creating draft simulations (explanations, plans) and then iteratively refining them, e. g. using critique and debate methods, couldn’t generalise reasoning.
That would be a very ineffective architecture for general intelligence. But note that humans seem to be approximately this kind of general intelligence. Humans don’t have the disciplines of epistemology, rationality, and ethics somehow hardwired or implemented symbolically in their heads. Rather, it seems to me that humans approximate thinking according to these theories by engaging in internal (or external, which is far more efficient) linguistic debate, being also linguistically “primed” after reading a handbook about rationality (or epistemology, or ethics, or any specific sub-discipline of these, or specialised versions of these disciplines applied to specific contexts, such as a textbook on business strategy as a specialisation of rationality to the business context).
Although humans have some of their epistemology, ethics, and rationality skills “implemented” non-linguistically, and this is clearly not only about intuitive (reflexive, habitual, “system one”) conditioning, but also deliberative (”system two”) reasoning, it’s not clear that LLM-based AI couldn’t make up for its relatively weak non-linguistic circuits implementing epistemology, ethics, and rationality with stronger linguistic skills, higher memory capacity, ability to make much more iterations to tirelessly refine some of their inferences, and (potentially) much better ability to source and use the relevant literature, such as the textbooks in epistemology, rationality, and ethics.
And, of course, as LLMs continue to scale and their architectures continue to improve, they may improve their non-linguistic epistemology, ethics, and rationality skills, perhaps discontinuously, even if these are currently at a very low level. Especially if they are trained on action sequences rather than simple texts (like Gato was).
4.1. LLMs are great at pretending to think Large text corpora contain a wealth of non-linguistic information, from mathematical and scientific facts (e.g., “two plus seven is nine”) to factual knowledge (e.g., “the capital of Texas is Austin”) to harmful stereotypes (e.g., “women belong in the kitchen”). This is not particularly surprising since even simple patterns of co-occurrence between words capture rich conceptual knowledge, including object properties, abstract analogies, social biases, and expert knowledge in specialized domains. Moreover, statistical regularities extracted from language and from visual scenes exhibit a substantial degree of correspondence, indicating that linguistic information can capture at least some aspects of experiential input.
As a result, language models trained on gigantic text corpora acquire large amounts of factual knowledge, succeed at some types of mathematical reasoning [e.g., Lewkowycz et al., 2022, Rae and Razavi, 2020] and reproduce many stereotypes and social biases. All these behaviors — both positive and negative — become more prominent as models gets larger, indicating that larger storage capacity allows LLMs to learn increasingly more fine-grained patterns in the input.
In this quote, the authors seem to imply that LLMs’ ability to “succeed at some types of mathematical reasoning” is thanks to “patterns memorisation” rather than “reasoning”. I don’t think this is proven. In fact, I’m almost sure that SoTA LLMs such as Minerva have some (even if so far only weak and imprecise) “logical/mathematical reasoning circuits” rather than just a memory of a collection of inductive reasoning patterns. Cf. section 5 of (Lewkowycz et al., 2022), where the authors argue that Minerva didn’t memoise solutions but generalised some patterns. To this Mahowald, Ivanova et al. could have responded that they were looking for generalisation beyond individual inductive patterns and towards a more coherent reasoning framework. However, as I already indicated above, trying to find some line here between “actual reasoning” and “memorising patterns” is a methodological dead-end, which is especially true for mathematical and logical reasoning, which sometimes and in some sense is just applying a collection of inductive rules (axioms).
Functional linguistic competences
We focus on four key capacities that are not language-specific but are nevertheless crucial for language use in real-life settings: i) formal reasoning—a host of abilities including logical reasoning, mathematical reasoning, relational reasoning, computational thinking, and novel problem solving; ii) world knowledge—knowledge of objects and their properties, actions, events, social agents, facts, and ideas; iii) situation modeling—the dynamic tracking of protagonists, locations, and events as a narrative/conversation unfolds over time; and iv) social reasoning—understanding the social context of linguistic exchanges, including what knowledge is shared, or in ‘common ground’, what the mental states of conversation participants are, and pragmatic reasoning ability. A simple conversation typically requires the use of all four of these capacities, yet none of them are specific to language use. Below, we provide evidence that these skills rely on non-language-specific processing mechanisms in humans and highlight LLMs’ failures as relevant to each domain.
I think the proposed ontology of functional competencies (capacities) is mistaken. To demonstrate this, I need to first introduce my own view on the ontology of general intelligence.
The functional decomposition (ontology) of general intelligence
The first category, “formal reasoning”, should be expanded with other, “non-formal” or “semi-formal” disciplines (competencies, capabilities) because it isn’t sensible to make the distinction between “formal” (or, we should better say, symbolic) and “informal” (connectionist) disciplines: rather, most functional disciplines of general intelligence should optimally[2] be implemented by interacting symbolic and connectionist components. To the first, crudest approximation, as I already mentioned above, we can decompose general intelligence into three big functional disciplines: epistemology, ethics, and rationality.
I have a view on the ontology of general intelligence to the “second approximation”, too, but don’t want to reveal it in public because it could be infohazardous. A few extra disciplines in this ontology that I need to mention for the discussion below, apart from epistemology, ethics, and rationality, are semantics (together with closely related philosophy of language and linguistics) and communication theory (such as the speech act theory).
Belief alignment is necessary for effective language use
“Functional competencies” ii-iv), namely world knowledge, situation modelling, and social reasoning are not competencies (capacities, disciplines): rather, they all point to the processes that intelligent agents should continuously engage in with the world and each other for their linguistic communication (and behaviour more generally) to be successful. These processes are conducted using the general intelligence disciplines, as described above, but they are not disciplines themselves.
Also, these processes: continuous grounding (updating) one’s world knowledge, situation modelling, and social reasoning could be collectively called belief alignment between humans or AIs. (More on this in a forthcoming post.)
In their “functional linguistic competencies”, Mahowald, Ivanova et al. may have tried to point to “applied” world theories that intelligent agents should learn in order to use language effectively: for example, learn “applied” physical theories such as classical mechanics, electromagnetism, hydraulics; psychology, sociology to build theories of mind and model social situations, etc. AGI couldn’t always learn everything from first principles, they must use past inferences.
However, methodologically, I’d still argue that it’s not the “applied” theories themselves that are important for effective language use, but exactly the process of aligning these theories between the interacting parties. For example, having a “broadly western” understanding of the world and “folk psychology and sociology” (all members of the society have some theories of human psychology and sociology in their heads, even if they have never heard these two words), may not allow conversing with people who have very different such theories. Cf. Pirahã language, and the story of Daniel Everett who tried to learn this language; “The anthropology of intentions” (Duranti 2015, chapter 11).
Misalignment breeds misalignment; training and belief alignment should be iterative
Mahowald, Ivanova et al.:
[LLMs] are trained to extract statistical information about words in text rather than a set of stable, consistent, and complete facts about the world. Any output generated by LLMs will be biased accordingly.
Creating a “set of stable, consistent, complete facts about the world” and imparting them in AI in one way or another is GOFAI-style utopia. Methodologically, AGI should be iteratively taught the general intelligence disciplines and inner-aligned with people on the world knowledge, as well as psychological and social models that people adopt. This iteration should proceed slowly because (inner) alignment is brittle unless humans and the model are already almost aligned: misalignment erodes further attempts to align via linguistic communication (including training a transformer on text examples). It doesn’t matter for this process whether the AGI-underlying model is a Transformer and thus resembles current LLM (or, rather multimodal transformers) or not.
Denying LLMs in understanding and knowledge grounding is confused
Mahowald, Ivanova et al.:
And, of course, models trained exclusively on text strings are, by design, incapable of using this knowledge to refer to real-world entities, meaning that they are incapable of using language in a physical environment the way humans do. Thus, LLMs in their current form are challenged to perform an essential feat of language comprehension: integrating incoming language information into a general, multimodal, dynamically evolving situation model.
Similar to “thinking” and “reasoning”, covered above, I think the attempts to draw a bright line between AIs “understanding” and “not understanding” language are methodologically confused. There are no bright lines; there is more or less grounding, and, of course, models that process only text have as little grounding as possible, but they dounderstand a concept as soon as they have a feature(s) for it, connected to other semantically related concepts in a sensible way. A multimodal Transformer will have more grounding than a pure LLM and be better at situational modelling (which, as I pointed out above, is a component of belief alignment).
Browning and LeCun also succumb to this categorical denial of understanding in language models in their piece for Noēma:
LLMs have no stable body or abiding world to be sentient of—so their knowledge begins and ends with more words and their common-sense is always skin-deep. The goal is for AI systems to focus on the world being talked about, not the words themselves — but LLMs don’t grasp the distinction. There is no way to approximate this deep understanding solely through language; it’s just the wrong kind of thing. Dealing with LLMs at any length makes apparent just how little can be known from language alone.
The same applies to the attempts to draw a line between AIs “referring to real-world entities” and “not referring to real-world entities”. Furthermore, I think this attempt is ontologically and semantically confused: no amount of grounding that makes some AI refer to the real world any stronger than another, less grounded model. Grounding could only change the accuracy and robustness of these references under changing conditions, but references don’t have strength[3]. AI could fail to refer to real-world entities when needed (or attempt to refer to them when not needed), though, if it doesn’t possess a good theory (discipline) of semantics itself. Correctly dealing with “this is not a pipe”-type of challenges in reasoning is a test of one’s skill of semantics, not one’s grounding. And humans make semantical mistakes in the face of such challenges, too.
The relationship between images and words in visual-language models is exactly the same as in humans
Shanahan:
[…] the relationship between a user-provided image and the words generated by the VLM [visual-language model] is fundamentally different from the relationship between the world shared by humans and the words we use when we talk about that world. Importantly, the former relationship is mere correlation, while the latter is causal. (Of course, there is causal structure to the computations carried out by the model during inference. But this is not the same as there being causal relations between words and the things those words are taken to be about.)
The consequences of the lack of causality are troubling. If the user presents the VLM with a picture of a dog, and the VLM says “This is a picture of a dog”, there is no guarantee that its words are connected with the dog in particular, rather than some other feature of the image that is spuriously correlated with dogs (such as the presence of a kennel). Conversely, if the VLM says there is a dog in an image, there is no guarantee that there actually is a dog, rather than just a kennel.
In the second part of this quote, the discussion of “kennel and dog” is exactly the discussion of reference robustness (see the section above), not the discussion of where VLMs reference when they generate words. It’s not because of “the lack of causality” that VLM can be confused. Humans can, in principle, be confused in exactly the same way (though humans are currently much more robust object detectors than VLMs, especially when it comes to such simple scenes as with kennels and dogs).
Ontologically, I think Shanahan’s framing is confused (especially invoking “correlations” wrt. VLMs). Interaction with the "world" (i. e., environment) amounts to performing measurements (and preparing the future state for the environment) using a set of quantum operators and then semantically interpreting the quantum state resulting from these measurements. Classical computation is well-defined when the quantum evolution (the propagation operator P) of the measured boundary state, the semantic interpretation function IF, and the classical computation operator F commute (Fields et al. 2022, sections 2.3 and 2.4).
Regardless of whether we think of the semantic interpretation as inducing causal links or not (the problem here is that quantum states and semantic information occupy two separate “worlds”), humans don’t have a relationship with their environment any different than a robot with a visual-language model and a camera: they both make semantic sense of the quantum state that they measure[4].
LLMs do have knowledge, encoded in connected features
Shanahan:
A bare-bones LLM doesn’t “really” know anything because all it does, at a fundamental level, is sequence prediction. Sometimes a predicted sequence takes the form of a proposition. But the special relationship propositional sequences have to truth is apparent only to the humans who are asking questions, or to those who provided the data the model was trained on. Sequences of words with a propositional form are not special to the model itself in the way they are to us. The model itself has no notion of truth or falsehood, properly speaking, because it lacks the means to exercise these concepts in anything like the way we do.
It could perhaps be argued that an LLM “knows” what words typically follow what other words, in a sense that does not rely on the intentional stance. But even if we allow this, knowing that the word “Burundi” is likely to succeed the words “The country to the south of Rwanda is” is not the same as knowing that Burundi is to the south of Rwanda. To confuse those two things is to make a profound category mistake.
So much for the bare-bones language model. What about the whole dialogue system of which the LLM is the core component? Does that have beliefs, properly speaking? At least the very idea of the whole system having beliefs makes sense. There is no category error here. However, for a simple dialogue agent like BOT, the answer is surely still “no”. A simple LLM-based question-answering system like BOT lacks the means to use the words “true” and “false” in all the ways, and in all the contexts, that we do. It cannot participate fully in the human language game of truth, because it does not inhabit the world we human language-users share.
First, Shanahan should have not put “really” in quotes because in this particular case, he does claim that bare-bones LLMs lack knowledge (and beliefs, which in Shanahan’s ontology appear to be slightly different, albeit related things) categorically, not empirically at their current level of sophistication. Implicit here in his position is that embodiment is a necessary condition to say that a system has some knowledge. Note that earlier in the paper, he describes LLMs as “generative mathematical models”. By this somewhat unusual injection of the word “mathematical”, I think he wanted to highlight that LLMs are disembodied and therefore disqualify as something that could have knowledge (or beliefs).
As I pointed out above already, Shanahan is wrong here: LLMs are embodied agents, they are physical systems (collections of physical variables—model parameters, somewhere on some computers) that interact with their environments. It’s unprincipled that Shanahan holds dialogue systems as embodied, but “bare-bones LLMs” as disembodied: both are cyber-physical systems. LLMs’ perception of time is very impoverished, though, and, indeed, dialogue systems are qualitatively different from bare-bones LLMs in the sense that we can talk about its planning during deployment[5]. However, this doesn’t categorically preclude LLMs from acquiring knowledge during training. When Shanahan says that LLMs “know that the word ‘Burundi’ likely succeeds the words ‘The country to the south of Rwanda is’”, he is cheating. LLMs don’t know this on the level of words, they know this on the level of concepts, i. e., features in their activations. Concepts of “Rwanda”, “Burundi”, “country”, and “south” are connected in the right ways. This is how knowledge is represented in LLMs.
Only in the context of a capacity to distinguish truth from falsehood can we legitimately speak of “belief” in its fullest sense. But an LLM is not in the business of making judgements. It just models what words are likely to follow from what other words. The internal mechanisms it uses to do this, whatever they are, cannot in themselves be sensitive to the truth or otherwise of the word sequences it predicts.
This is mistaken because LLMs are “in the business” of improving their world models proactively.Fields and Levin (2022) demonstrate this:
The informational symmetry of the FEP suggests that both fully-passive training and fully-autonomous exploration are unrealistic as models of systems embedded in and physically interacting with real, as opposed to merely formal, environments. The objective of training is to produce predictable behavior by an initially unpredictable system. Training is, in other words, a method of reducing VFE.
LLMs could be curious and intentional
Mahowald, Ivanova et al.:
Moreover, LLMs themselves lack communicative intent [Shanahan, 2022, Bender et al., 2021]. The closest they come to intentionality is modeling a document-specific distribution of language patterns, which can result in generated strings that are overall consistent with a particular person/agent [Andreas, 2022], but the intent behind these strings is still missing. Globally speaking, these models have nothing to say. Nor should we expect them to: LLMs’ training objective is maximizing next-/masked-word predictive accuracy, not generating utterances that allow them to achieve specific goals in the world. (Emphasis added by R. L.)
Shanahan:
[…] the basic function of a large language model, namely to generate statistically likely continuations of word sequences, is extraordinarily versatile. Second, notwithstanding this versatility, at the heart of every such application is a model doing just that one thing: generating statistically likely continuations of word sequences.
With this insight to the fore, let’s revisit the question of how LLMs compare to humans, and reconsider the propriety of the language we use to talk about them. In contrast to humans like Bob and Alice, a simple LLM-based question-answering system, such as BOT, has no communicative intent (Bender and Koller, 2020). In no meaningful sense, even under the licence of the intentional stance, does it know that the questions it is asked come from a person, or that a person is on the receiving end of its answers. By implication, it knows nothing about that person. It has no understanding of what they want to know nor of the effect its response will have on their beliefs.
While what Mahowald, Ivanova et al. and Shanahan say is probably true in the current LLMs (except the italicised sentence in the quote from Mahowald, Ivanova et al.), a reductionist implication behind these philosophical positions, namely, that LLMs couldn’t have communicative intent, is wrong. LLMs are Active Inference agents. I demonstrated how they could begin to exhibit experimentally noticeable curiosity (which, in this context, is close to intentionality) here. Also, taking a radically gradualistic stance, we cannot deny that trace levels of intentionality and communicative intent are already present in the current LLMs.
In one place of his paper, Shanahan actually registers a completely reasonable, non-reductionist philosophical position:
To be clear, it is not the argument of this paper that a system based on a large language model could never, in principle, warrant description in terms of beliefs, intentions, reason, etc. Nor does the paper advocate any particular account of belief, of intention, or of any other philosophically contentious concept. (In particular, when I use the term “really”, as in the question ‘Does X “really” have Y?’, I am not assuming there is some metaphysical fact of the matter here. Rather, the question is whether, when more is revealed about the nature of X, we still want to use the word Y.) Rather, the point is that such systems are simultaneously so very different from humans in their construction, yet (often but not always) so human-like in their behaviour, that we need to pay careful attention to how they work before we speak of them in language suggestive of human capabilities and patterns of behaviour.
Unfortunately, Shanahan contradicts himself later in the paper:
This argument has considerable appeal. After all, the overriding lesson of recent progress in LLMs is that extraordinary and unexpected capabilities emerge when big enough models are trained on very large quantities of textual data. However, as long as our considerations are confined to a simple LLM-based question-answering system, this has little bearing on the issue of communicative intent. It doesn’t matter what internal mechanisms it uses, a sequence predictor is not, in itself, the kind of thing that could, even in principle, have communicative intent, and simply embedding it in a dialogue management system will not help.
Similar to the italicised sentence in the quote from Mahowald, Ivanova et al., this is just mistaken, as I demonstrated above.
LLMs communicate meaning: “Train me better!”
Mahowald, Ivanova et al.:
Although all of these sentences are grammatical (and actually obey a sensible scheme for a 4-paneled figure in an academic paper), GPT-3 has no intention, no broader meaning to communicate, and so, at some point sufficiently removed from the human-generated prompt, it will start becoming incoherent.
Further, even if given explicit instructions, LLMs can be easily distracted, as demonstrated by the example in Figure 3. Attempts to align the model’s output with the user’s intent often require adding an objective other than language modeling [e.g., Ouyang et al., 2022, InstructGPT], and even those are imperfect. Overall, LLMs’ inability to infer and maintain the goals of the interaction means that their outputs will often be meaningless and/or mis-specified despite high linguistic well-formedness.
Mahowald, Ivanova et al. here use the word “meaning” in the sense that is close to the word “purpose” and related to intent, i. e., only intentional communication act could bear meaning[6].
However, per Fields and Levin (2022), as I quoted above, no interaction between agents (i. e., all trackable physical systems) is completely devoid of purpose: minimising their VFE (and helping its environment minimise its VFE wrt. the agent itself). Therefore, the “meaningless” output by an LLM that Mahowald, Ivanova et al. provided actually could be seen as communicating to LLM engineers the weak sides of the LLM capabilities, with the “intention” that engineers think about how to strengthen these poor skills in the future systems in the evolutionary lineage (or tree) of LLMs.
I think there is a continuum, not a categorical distinction between the kind of “conversational” meaning Mahowald, Ivanova et al. were referring to, and the “physical” meaning that I described above. But even talking about the “conversational” meaning, it will appear when LLMs exhibit significant intentionality, as noted in the previous section, and this in principle could happen with LLMs if they are trained on a huge number of small batches. This could also happen with LLM-based AI in more prosaic and realistic ways. For example, arguably, this has already happened in Cicero.
Kadavath, Saurav, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer et al. "Language models (mostly) know what they know." arXiv preprint arXiv:2207.05221 (2022).
Mahowald, Ivanova et al. actively use the term “network” from neuroscience throughout the paper. The closest equivalents of “networks” in the domain of ANNs are “circuits”, and I use “circuits” in the context of ANNs in this post, because “network” is confusing: ANNs are also networks.
There could be some more subtlety in the architecture of the human brain (connectome): language could somehow be produced “in parallel” or “together” with visual percept in humans, while a robot with a camera clearly first records the image and only then produces text based on that image. Therefore, human language utterances could be, in some sense, more “directly” caused by the measurements of the world than robot utterances. However, robot utterances are still caused by the measurements of the world in this case, transitively. And it doesn’t seem that this is the issue Shanahan was pointing towards.
I structure this post as a critique of some recent papers on the philosophy of mind in application to LLMs, concretely, on whether we can say that LLMs think, reason, understand language, refer to the real world when producing language, have goals and intents, etc. I also use this discussion as a springboard to express some of my views about the ontology of intelligence, agency, and alignment.
Dissociating language and thought in large language models: a cognitive perspective
In this section, I shortly expose the gist of the paper by Mahowald, Ivanova, et al., for the convenience of the reader.
Abstract:
Two more characteristic quotes from the paper:
As evident from the title of the paper (and this section), the authors use the words “thinking” and “thought” as a synonym for functional linguistic competence that they define in the paper.
General critiques
Failure at a task doesn’t imply the absence of capability
Throughout the paper, Mahowald, Ivanova, et al. take evidence of failure (either occasional failure, or nearly universal failure) in specific reasoning and comprehension, and functional tasks as proof that current LLMs don’t have this or that capability.
I think this is a methodological mistake for two reasons.
First, this admits a squarely representationalist philosophy of mind: in order to say that a mind has capacity X, it should have this capacity “actually implemented somehow”, in a regularised way. If a mind commits a mistake in reasoning, it means that the capability in question is not represented, otherwise, such a failure would be impossible.
But this is clearly false for humans: humans make mistakes in all types of capabilities (types of reasoning), from formal linguistic to “functional” (as Mahowald, Ivanova, et al. call them), all the time. Especially when humans are mentally incapacitated, drunk, sleepy, etc. This doesn’t mean that humans don’t have given cognitive capabilities. Therefore, counting mistakes in LLM outputs is a methodological dead-end. Could it be that LLMs are currently in the state that humans call a “cloudy mind”, when thoughts are entangled, and the control of the output is far from perfect? I think this could actually be the case.
From a more foundational perspective, austere representationalism in the last decades clearly loses the battle to enactivism, or at least integrative representationalist-enactivist perspectives (Constant et al. 2021).
Second, from the pragmatic point of view, I think there is little scientific value in finding the capabilities that the latest LLMs don’t have. We are all too familiar with the failure of this approach, evidenced in the last 3 years by Gary Marcus.
Taking enactivism seriously motivates a radically gradualistic and empirical approach to evaluating capacities and qualities of AI artifacts (Levin 2022), and discourages catchy and categorical phrases like “LLMs don’t think”, “LLMs have no understanding of the world”, etc.
Reductionism and the ladder of language model understanding
I think there is a three-level hierarchy of understanding LLMs and their behaviours:
This is whereBlake Lemoine wasin the middle of last year.janus points out in the comments that Lemoine was actually more on the third level.Both Mahowald, Ivanova, et al. and Shanahan present their positions as reactionary to the takes on the first level of understanding, for example, by the media (especially Shanahan). And although they occasionally pay lip service to the third level, these occasions are greatly outnumbered by categorical reductionist statements that they make throughout their papers, particularly conditioned on the fact that LLMs are trained to “predict the next token in sequence”.
LLMs use language affordances differently than people, even qualitatively differently (e. g., humans have more senses and hence qualitatively different grounding than LLMs could ever have), but this doesn’t mean LLMs don’t use language affordances at all. Both humans and LLMs use language to interact with the world, learn the world better, and align with agents that surround them (more on this below in the post).
The fact that LLMs use language affordances places them in the category of language users, which was previously occupied only by humans and parrots. Crucial here is that the user is an active, not a passive role.
More specific points
Linguistic capability circuits inside LLM-based AI could be sufficient for approximating general intelligence
Mahowald, Ivanova et al. write that “the language network[1] [in human brains] does not support non-linguistic cognition”. This is probably correct but is also irrelevant in the context of considering whether LLM-based AI could reach an AGI level. For humans, too, it’s clear that creating large and hierarchical theories, simulations, and plans, is impossible without language (more specifically, writing and editing), because the humans’ working memory capacity is severely limited. It’s not clear why “thinking step-by-step on steroids” (a-la language model cascades, Dohan et al. 2022), with creating draft simulations (explanations, plans) and then iteratively refining them, e. g. using critique and debate methods, couldn’t generalise reasoning.
That would be a very ineffective architecture for general intelligence. But note that humans seem to be approximately this kind of general intelligence. Humans don’t have the disciplines of epistemology, rationality, and ethics somehow hardwired or implemented symbolically in their heads. Rather, it seems to me that humans approximate thinking according to these theories by engaging in internal (or external, which is far more efficient) linguistic debate, being also linguistically “primed” after reading a handbook about rationality (or epistemology, or ethics, or any specific sub-discipline of these, or specialised versions of these disciplines applied to specific contexts, such as a textbook on business strategy as a specialisation of rationality to the business context).
Although humans have some of their epistemology, ethics, and rationality skills “implemented” non-linguistically, and this is clearly not only about intuitive (reflexive, habitual, “system one”) conditioning, but also deliberative (”system two”) reasoning, it’s not clear that LLM-based AI couldn’t make up for its relatively weak non-linguistic circuits implementing epistemology, ethics, and rationality with stronger linguistic skills, higher memory capacity, ability to make much more iterations to tirelessly refine some of their inferences, and (potentially) much better ability to source and use the relevant literature, such as the textbooks in epistemology, rationality, and ethics.
And, of course, as LLMs continue to scale and their architectures continue to improve, they may improve their non-linguistic epistemology, ethics, and rationality skills, perhaps discontinuously, even if these are currently at a very low level. Especially if they are trained on action sequences rather than simple texts (like Gato was).
See also: “The Limit of Language Models” by DragonGod.
On mathematical reasoning
In this quote, the authors seem to imply that LLMs’ ability to “succeed at some types of mathematical reasoning” is thanks to “patterns memorisation” rather than “reasoning”. I don’t think this is proven. In fact, I’m almost sure that SoTA LLMs such as Minerva have some (even if so far only weak and imprecise) “logical/mathematical reasoning circuits” rather than just a memory of a collection of inductive reasoning patterns. Cf. section 5 of (Lewkowycz et al., 2022), where the authors argue that Minerva didn’t memoise solutions but generalised some patterns. To this Mahowald, Ivanova et al. could have responded that they were looking for generalisation beyond individual inductive patterns and towards a more coherent reasoning framework. However, as I already indicated above, trying to find some line here between “actual reasoning” and “memorising patterns” is a methodological dead-end, which is especially true for mathematical and logical reasoning, which sometimes and in some sense is just applying a collection of inductive rules (axioms).
Functional linguistic competences
I think the proposed ontology of functional competencies (capacities) is mistaken. To demonstrate this, I need to first introduce my own view on the ontology of general intelligence.
The functional decomposition (ontology) of general intelligence
The first category, “formal reasoning”, should be expanded with other, “non-formal” or “semi-formal” disciplines (competencies, capabilities) because it isn’t sensible to make the distinction between “formal” (or, we should better say, symbolic) and “informal” (connectionist) disciplines: rather, most functional disciplines of general intelligence should optimally[2] be implemented by interacting symbolic and connectionist components. To the first, crudest approximation, as I already mentioned above, we can decompose general intelligence into three big functional disciplines: epistemology, ethics, and rationality.
I have a view on the ontology of general intelligence to the “second approximation”, too, but don’t want to reveal it in public because it could be infohazardous. A few extra disciplines in this ontology that I need to mention for the discussion below, apart from epistemology, ethics, and rationality, are semantics (together with closely related philosophy of language and linguistics) and communication theory (such as the speech act theory).
Belief alignment is necessary for effective language use
“Functional competencies” ii-iv), namely world knowledge, situation modelling, and social reasoning are not competencies (capacities, disciplines): rather, they all point to the processes that intelligent agents should continuously engage in with the world and each other for their linguistic communication (and behaviour more generally) to be successful. These processes are conducted using the general intelligence disciplines, as described above, but they are not disciplines themselves.
Also, these processes: continuous grounding (updating) one’s world knowledge, situation modelling, and social reasoning could be collectively called belief alignment between humans or AIs. (More on this in a forthcoming post.)
In their “functional linguistic competencies”, Mahowald, Ivanova et al. may have tried to point to “applied” world theories that intelligent agents should learn in order to use language effectively: for example, learn “applied” physical theories such as classical mechanics, electromagnetism, hydraulics; psychology, sociology to build theories of mind and model social situations, etc. AGI couldn’t always learn everything from first principles, they must use past inferences.
However, methodologically, I’d still argue that it’s not the “applied” theories themselves that are important for effective language use, but exactly the process of aligning these theories between the interacting parties. For example, having a “broadly western” understanding of the world and “folk psychology and sociology” (all members of the society have some theories of human psychology and sociology in their heads, even if they have never heard these two words), may not allow conversing with people who have very different such theories. Cf. Pirahã language, and the story of Daniel Everett who tried to learn this language; “The anthropology of intentions” (Duranti 2015, chapter 11).
Misalignment breeds misalignment; training and belief alignment should be iterative
Mahowald, Ivanova et al.:
Creating a “set of stable, consistent, complete facts about the world” and imparting them in AI in one way or another is GOFAI-style utopia. Methodologically, AGI should be iteratively taught the general intelligence disciplines and inner-aligned with people on the world knowledge, as well as psychological and social models that people adopt. This iteration should proceed slowly because (inner) alignment is brittle unless humans and the model are already almost aligned: misalignment erodes further attempts to align via linguistic communication (including training a transformer on text examples). It doesn’t matter for this process whether the AGI-underlying model is a Transformer and thus resembles current LLM (or, rather multimodal transformers) or not.
Denying LLMs in understanding and knowledge grounding is confused
Mahowald, Ivanova et al.:
Similar to “thinking” and “reasoning”, covered above, I think the attempts to draw a bright line between AIs “understanding” and “not understanding” language are methodologically confused. There are no bright lines; there is more or less grounding, and, of course, models that process only text have as little grounding as possible, but they do understand a concept as soon as they have a feature(s) for it, connected to other semantically related concepts in a sensible way. A multimodal Transformer will have more grounding than a pure LLM and be better at situational modelling (which, as I pointed out above, is a component of belief alignment).
Browning and LeCun also succumb to this categorical denial of understanding in language models in their piece for Noēma:
The same applies to the attempts to draw a line between AIs “referring to real-world entities” and “not referring to real-world entities”. Furthermore, I think this attempt is ontologically and semantically confused: no amount of grounding that makes some AI refer to the real world any stronger than another, less grounded model. Grounding could only change the accuracy and robustness of these references under changing conditions, but references don’t have strength[3]. AI could fail to refer to real-world entities when needed (or attempt to refer to them when not needed), though, if it doesn’t possess a good theory (discipline) of semantics itself. Correctly dealing with “this is not a pipe”-type of challenges in reasoning is a test of one’s skill of semantics, not one’s grounding. And humans make semantical mistakes in the face of such challenges, too.
The relationship between images and words in visual-language models is exactly the same as in humans
Shanahan:
In the second part of this quote, the discussion of “kennel and dog” is exactly the discussion of reference robustness (see the section above), not the discussion of where VLMs reference when they generate words. It’s not because of “the lack of causality” that VLM can be confused. Humans can, in principle, be confused in exactly the same way (though humans are currently much more robust object detectors than VLMs, especially when it comes to such simple scenes as with kennels and dogs).
Ontologically, I think Shanahan’s framing is confused (especially invoking “correlations” wrt. VLMs). Interaction with the "world" (i. e., environment) amounts to performing measurements (and preparing the future state for the environment) using a set of quantum operators and then semantically interpreting the quantum state resulting from these measurements. Classical computation is well-defined when the quantum evolution (the propagation operator P) of the measured boundary state, the semantic interpretation function IF, and the classical computation operator F commute (Fields et al. 2022, sections 2.3 and 2.4).
Regardless of whether we think of the semantic interpretation as inducing causal links or not (the problem here is that quantum states and semantic information occupy two separate “worlds”), humans don’t have a relationship with their environment any different than a robot with a visual-language model and a camera: they both make semantic sense of the quantum state that they measure[4].
LLMs do have knowledge, encoded in connected features
Shanahan:
First, Shanahan should have not put “really” in quotes because in this particular case, he does claim that bare-bones LLMs lack knowledge (and beliefs, which in Shanahan’s ontology appear to be slightly different, albeit related things) categorically, not empirically at their current level of sophistication. Implicit here in his position is that embodiment is a necessary condition to say that a system has some knowledge. Note that earlier in the paper, he describes LLMs as “generative mathematical models”. By this somewhat unusual injection of the word “mathematical”, I think he wanted to highlight that LLMs are disembodied and therefore disqualify as something that could have knowledge (or beliefs).
As I pointed out above already, Shanahan is wrong here: LLMs are embodied agents, they are physical systems (collections of physical variables—model parameters, somewhere on some computers) that interact with their environments. It’s unprincipled that Shanahan holds dialogue systems as embodied, but “bare-bones LLMs” as disembodied: both are cyber-physical systems. LLMs’ perception of time is very impoverished, though, and, indeed, dialogue systems are qualitatively different from bare-bones LLMs in the sense that we can talk about its planning during deployment[5]. However, this doesn’t categorically preclude LLMs from acquiring knowledge during training. When Shanahan says that LLMs “know that the word ‘Burundi’ likely succeeds the words ‘The country to the south of Rwanda is’”, he is cheating. LLMs don’t know this on the level of words, they know this on the level of concepts, i. e., features in their activations. Concepts of “Rwanda”, “Burundi”, “country”, and “south” are connected in the right ways. This is how knowledge is represented in LLMs.
Finally, I think that the statement “The model itself has no notion of truth or falsehood, properly speaking, because it lacks the means to exercise these concepts in anything like the way we do.” is also wrong, at least categorically: see “How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme” (Burns et al. 2022), and in some sense practically, already: see “Language Models (Mostly) Know What They Know” (Kadavath et al. 2022).
Shanahan:
This is mistaken because LLMs are “in the business” of improving their world models proactively. Fields and Levin (2022) demonstrate this:
LLMs could be curious and intentional
Mahowald, Ivanova et al.:
Shanahan:
While what Mahowald, Ivanova et al. and Shanahan say is probably true in the current LLMs (except the italicised sentence in the quote from Mahowald, Ivanova et al.), a reductionist implication behind these philosophical positions, namely, that LLMs couldn’t have communicative intent, is wrong. LLMs are Active Inference agents. I demonstrated how they could begin to exhibit experimentally noticeable curiosity (which, in this context, is close to intentionality) here. Also, taking a radically gradualistic stance, we cannot deny that trace levels of intentionality and communicative intent are already present in the current LLMs.
In one place of his paper, Shanahan actually registers a completely reasonable, non-reductionist philosophical position:
Unfortunately, Shanahan contradicts himself later in the paper:
Similar to the italicised sentence in the quote from Mahowald, Ivanova et al., this is just mistaken, as I demonstrated above.
LLMs communicate meaning: “Train me better!”
Mahowald, Ivanova et al.:
Mahowald, Ivanova et al. here use the word “meaning” in the sense that is close to the word “purpose” and related to intent, i. e., only intentional communication act could bear meaning[6].
However, per Fields and Levin (2022), as I quoted above, no interaction between agents (i. e., all trackable physical systems) is completely devoid of purpose: minimising their VFE (and helping its environment minimise its VFE wrt. the agent itself). Therefore, the “meaningless” output by an LLM that Mahowald, Ivanova et al. provided actually could be seen as communicating to LLM engineers the weak sides of the LLM capabilities, with the “intention” that engineers think about how to strengthen these poor skills in the future systems in the evolutionary lineage (or tree) of LLMs.
I think there is a continuum, not a categorical distinction between the kind of “conversational” meaning Mahowald, Ivanova et al. were referring to, and the “physical” meaning that I described above. But even talking about the “conversational” meaning, it will appear when LLMs exhibit significant intentionality, as noted in the previous section, and this in principle could happen with LLMs if they are trained on a huge number of small batches. This could also happen with LLM-based AI in more prosaic and realistic ways. For example, arguably, this has already happened in Cicero.
References
Browning, Jacob, and Yann LeCun. “AI And The Limits Of Language.” (2022).
Burns, Collin, Haotian Ye, Dan Klein, and Jacob Steinhardt. "Discovering Latent Knowledge in Language Models Without Supervision." arXiv preprint arXiv:2212.03827 (2022).
Constant, Axel, Andy Clark, and Karl J. Friston. "Representation wars: Enacting an armistice through active inference." Frontiers in Psychology 11 (2021): 598733.
Dohan, David, Winnie Xu, Aitor Lewkowycz, Jacob Austin, David Bieber, Raphael Gontijo Lopes, Yuhuai Wu et al. "Language model cascades." arXiv preprint arXiv:2207.10342 (2022).
Duranti, Alessandro. The anthropology of intentions. Cambridge University Press, 2015.
Fields, Chris, James F. Glazebrook, and Michael Levin. "Minimal physicalism as a scale-free substrate for cognition and consciousness." Neuroscience of Consciousness 2021, no. 2 (2021): niab013.
Fields, Chris, Karl Friston, James F. Glazebrook, and Michael Levin. "A free energy principle for generic quantum systems." Progress in Biophysics and Molecular Biology (2022).
Fields, Chris, and Michael Levin. "Regulative development as a model for origin of life and artificial life studies." (2022).
Kadavath, Saurav, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer et al. "Language models (mostly) know what they know." arXiv preprint arXiv:2207.05221 (2022).
Leventov, Roman. "Properties of current AIs and some predictions of the evolution of AI from the perspective of scale-free theories of agency and regulative development.” (2022a).
Leventov, Roman. “How evolutionary lineages of LLMs can plan their own future and act on these plans.” (2022b).
Levin, Michael. "Technological approach to mind everywhere: an experimentally-grounded framework for understanding diverse bodies and minds." Frontiers in Systems Neuroscience (2022): 17.
Lewkowycz, Aitor, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone et al. "Solving quantitative reasoning problems with language models." arXiv preprint arXiv:2206.14858 (2022).
Mahowald, Kyle, Ivanova, Anna A., Blank, Idan A., Kanwisher, Nancy, Tenenbaum, Joshua B., and Evelina Fedorenko. "Dissociating language and thought in large language models: a cognitive perspective." ArXiv, (2023). Accessed January 20, 2023. https://doi.org/10.48550/arXiv.2301.06627.
Perez, Ethan, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit et al. "Discovering Language Model Behaviors with Model-Written Evaluations." arXiv preprint arXiv:2212.09251 (2022).
Shanahan, Murray. "Talking About Large Language Models." arXiv preprint arXiv:2212.03551 (2022).
Mahowald, Ivanova et al. actively use the term “network” from neuroscience throughout the paper. The closest equivalents of “networks” in the domain of ANNs are “circuits”, and I use “circuits” in the context of ANNs in this post, because “network” is confusing: ANNs are also networks.
Because of the “no free lunch” theorem.
It’s unclear, though, whether these references exist at all, regardless of who produces language.
There could be some more subtlety in the architecture of the human brain (connectome): language could somehow be produced “in parallel” or “together” with visual percept in humans, while a robot with a camera clearly first records the image and only then produces text based on that image. Therefore, human language utterances could be, in some sense, more “directly” caused by the measurements of the world than robot utterances. However, robot utterances are still caused by the measurements of the world in this case, transitively. And it doesn’t seem that this is the issue Shanahan was pointing towards.
Lineages of bare-bones LLMs also plan, though, on evolutionary timescales (Leventov 2022b).
This differs from the notion of meaning as “information that makes a difference”, as per (Fields et al. 2021).