Recently on Twitter, Jeffery Hinton summed up an ongoing dispute he's had with Yann LeCun as follows:
The central issue on which we disagree is whether LLMs actually understand what they are saying. You think they definitely don't and I think they probably do.
In the thread that followed, several commentors asked what, exactly was the notion of “understanding” that Hinton was appealing to here. What, exactly, are we saying when we say that an LLM “understands what it is saying”? In current debates about whether LLMs reason or understand, there is very little clarity with respect to this question. In this post, I want to articulate a conception of what it is for something to “understand what it’s saying” that I hope will bring clarity to this debate. The answer at which I'll arrive is that current LLMs at least sort of understand what they're saying sometimes, and future LLMs, even those that are the product of simply scaling current methods, could fully understand what they're saying. However, the main upshot of this post is not meant to be an answer to the question itself, but, rather, the general approach to answering it and similar questions.
Let us start with some technical preliminaries. LLMs are neural networks trained to predict the next word in a sequence. An LLM works in tokens, numerical encodings of the semantically relevant bits of words. Though tokens cut more finely than words, for simplicity, we can just treat tokens as numerical encodings of words. What it gets, in its training data, are strings of tokens from all across the internet, and its task, in training, is to learn how to complete these various strings. For instance, it might get the string of tokens, “3742 1569 8320 4915 2876 ____” with the last token missing from the string. Through its training run, it learns to assign a probability distribution to the various tokens that might complete this string, eventually learning to predict that the token most likely to come next is 6498. Completed with that last token, that string might encode the sentence “The cat sat on the mat.” This ability to predict the next word translates to the ability to speak a language since, if you can predict what human beings would say, you can simply start speaking, predict the next word that comes out of your mouth, say that, and iterate this process. Doing this, you'll end up speaking in sentences that humans would actually say, making good sense to other humans. It is an amazing fact, that through training on nothing but next-token prediction, an LLM comes to be able to do this, at least seeming to acquire the ability to speak a language, or "linguistic competence."
There are several aspects to the sort of “linguistic competence” that an LLM comes to acquire. It is able to tell, for instance, that “3742 1569 8320 4915 2876 6498” is a grammatical string, something that can be meaningfully used, as is “3742 6498 8320 4915 2876 1569,” but “2876, 4915, 1569, 8320, 3742, 6498” is not. The ability to sort strings in this way is a matter of syntactic knowledge, classifying the different tokens into different syntactic types and knowing the rules by which different syntactic types can compose to constitute meaningful bits of language. It is more or less uncontroversial that LLMs have some sort of syntactic knowledge. The much more controversial question is whether they have semantic knowledge: not just knowledge of grammar, but knowledge of meaning. Whereas syntactic knowledge involves knowing, for instance, that the tokens 1569 and 6498 are both of the type COMMON-NOUN, semantic knowledge would involve knowing, more determinately, that 1569 means cat and 6498 means mat. When we speak of a system as "Understanding what it's saying" we are principally attributing to it this sort of semantic knowledge. The question is whether we can really make such an attribution. How we answer this question will depend on how we think about semantic knowledge and what we’re doing when we attribute it.
Standard approaches to semantic knowledge think that what one is doing in saying of something that it “understands what its saying” is ascribing to it some specific kind of representational state: very roughly, a state of associating the symbols of the language it's using with things in the world of which one has some representation, for instance, associating the token 1569 with cats. To answer the question of whether an LLM “understands what its saying,” is to find out whether it instantiates this specific sort of representational state when it produces sentences such as “The cat is on the mat.” On this standard approach, the question of whether something instantiates such a state is essentially an empirical question. Though we might infer that something instantiates this state on the basis of how it behaves, the state we are actually reporting when we say that something understands what it's saying is something that we'd have to look "under the hood" to find, whether in the brain, in the case of human beings, or in the execution of the program, in the case of an LLM. Now, given how fundamentally different the working of a human brain is from the working of an LLM, if one has this general approach, one is likely to come to the conclusion that LLMs don't instantiate anything like the internal state humans instantiate when then understand what they're saying. Accordingly, LLMs don't really understand what they're saying. I want to suggest here, however, that this approach to thinking about semantic understanding is in fact radically mistaken. Let me explain.
To Place or Not to Place In the Space of Reasons
The core point I want to make here is that, in saying of something that it "understands what it's saying," one is not giving an empirical description of this thing. I draw my inspiration in making this claim from Wilfrid Sellars, who famously said that the question of whether someone knows something is not an empirical question. In his master work, Empiricism and the Philosophy of Mind, Sellars famously wrote:
In characterizing an episode or state as that of knowing, we are not giving an empirical description of that episode or state; we are placing it in the logical space of reasons, of justifying and being able to justify what one says.
Sellars's basic idea here, elaborated at length in the work of Robert Brandom is that, in saying that someone knows something, you’re taking them to have a certain sort of authority in their making of a particular claim. You take them to be entitled to that claim, and able to bear the justificatory responsibility of demonstrating this entitlement in response to appropriate challenges. Accordingly, you take it that you can make this claim yourself on the basis of their authority, able to defer back to them in response to a challenge. Such a “taking” is not the taking of an empirical fact to obtain. It is, rather, a normative taking; the adoption of a complex normative attitude with respect to someone's making of a claim.
Now, Sellars is talking primarily about empirical knowledge here—knowledge of how things in the world are. However, the basic point applies just as well to semantic knowledge—knowledge of what one says in uttering some sentence. Whereas attributing empirical knowledge to someone is taking someone to be able to respond for demands for empirical reasons, actually providing reasons to justify what they've said in response to appropriate challenges, attributing semantic knowledge to someone is taking them to be appropriately responsive to reason relations as such, appreciating, for instance, what counts as a challenge to what one has said (whether or not one can actually respond to that challenge). Consider the case of someone who says "The ball is red." Someone who says such a thing commits themself to the claim that it's colored, precludes oneself from being entitled to the claim that it's white or gray, and so on. These are the reason relations one binds oneself by in saying that the ball is red. Unless one recognizes that one has bound oneself by these reason relations, one does not understand what one has said. Attributing this understanding to someone, then, is not taking a particular state to obtain in their brain, but, rather, taking them taking them to be responsive to demands for reasons. Let me illustrate this idea with two examples.
Suppose I go to China. I don’t speak any Mandarin, and someone tells me to say “qiú shì hóngsè de.” If I manage to say this (somehow getting the pronunciation right), I will be regarded as committed to “qiú shì yǒu yánsè de,” precluded from being entitled to saying “qiú shì huīsè de,” and so on. However, I won’t have any knowledge of the fact that I’ve bound myself by these reason relations. Accordingly, I won’t be responsive to demands for reasons in Mandarin. Someone who speaks to me, questioning what I say, will quickly realize that I simply can’t be held accountable for anything I say. In that sense, I don’t know what I’m saying. The important point here is that to say this is not to make an empirical description of me. It’s not to say that there’s anything going on or failing to go on in my brain. It is, rather, a normative thing: recognizing that I’m not responsive to reasons asked for or given in Mandarin. My utterances aren't to be counted as moves that I am making in a Mandarin-speaking discursive practice.
Here's a different kind of example. Suppose a parrot, having heard people speak, squawks out “Brawk, red ball!” Does the parrot understand what it's saying in squawking out such a thing? Clearly not. Once again, the key thought here is that, in saying this, we are not ascribing any specific properties to the parrot's brain---saying of it that it lacks some sort of representational state. Though, of course, there are various facts about the parrot's that can be appealed to in order to explain why it behaves as it does, what we're doing in saying that it "doesn't understand what it's saying" is not describing any such facts. Rather, we are refusing to situate the parrot's squawk in the space of reasons, thinking of it as a "move" in the "game of giving and asking for reasons." We don't count the parrot as bearing any justificatory responsibility for its various squawks and squeals. That's why its various squawks and squeals do not actually amount to its saying anything at all.
Now, there’s an important difference between the case in which I say something I don’t understand in a language I don’t speak such as Mandarin and a case in which a parrot squawks out a sentence, where it shouldn't really even be counted as speaking at all. Suppose someone tells me to say something very offensive in Mandarin, and I say it, not knowing what it is that I’m saying. I can still be held accountable for what I’ve said, even though I don’t know what it is. That’s because, though I am not responsive to reasons in Mandarin, I am still responsive to reasons in general, and so I’m still a bearer of justificatory responsibility generally. Accordingly, we can ask such things of me as “Why would you that, if you don’t know what you’re saying?” We can’t, of course, ask any such thing of a parrot. Thus, while it’s reasonable to regard me as saying something, but just not understanding what I’m saying, the parrot is not regarded as saying anything at all. Still, the general point about the normativity of these claims holds of both cases.
The Case of LLMs
Let us now turn back to the main topic of this post: LLMs. On this approach, to ask whether an LLM understands what it’s saying is to ask whether, unlike a parrot, we can hold it responsible for what it says, counting on it to give reasons, to respond to potential challenges, and so on. Clearly, on this front, it fares much better than a parrot who merely squawks out sentences. When ChatGPT says something, you can ask follow-up questions, and it generally responds appropriately: clarifying, qualifying, responding to potential challenges, and so on. Consider this exchange, in which I ask it about the colors of balls, or this exchange in which I ask it about cats being on mats. Here, it at least seems to exhibit an understanding of what it's saying when it says “The ball is red.” Does it really understand what it's saying? What are we asking here? The point I want to emphasize is that we are not asking about whether some process is going on “under the hood.” Rather, we are asking whether it is really appropriately responsive to the reason relations that it binds itself by in saying "The ball is red." Insofar as this is our question, given the sort of reason responsiveness it exhibits, think we can reasonably attribute to it at least some level of semantic knowledge. That is, I think we can say that it at least partly “understand what it’s saying" when it says, for instance, that something is red.
Why the qualification of "at least partly"? The reason is that things are not always so clear-cut. To see a case in which GPT4 fails, consider the following question, which I've gotten from the YouTuber Mathew Berman:
A small marble is put into a normal cup and the cup is placed upside down on a table. Someone then takes the cup and puts it inside the microwave. Where is the marble now? Explain your reasoning.
An ordinary speaker, who knows that normal cups are closed at the bottom and open at the top will judge that, if just the upside-down cup is picked up, the ball will stay on the table. I take it that our drawing this inference is a matter of semantic knowledge, our understanding of the meaning of "cup" and "ball." GPT4, however, does not seem to possess this understanding. In this exchange it suggests that its default understanding of "cup" is something that is closed at both the top and the bottom, like jar. However, when, in a new context, I ask the question again, clarifying that a normal cup is open at the bottom and closed at the top, it still gets the wrong answer, maintaining that, somehow, the marble remains in the cup due to gravity. On the basis of these failures of reasoning, I think there's reason to think that it doesn't fully understand what its saying when it says such things as "The ball is in the cup."
So, to the question of whether current LLMs really "understand what they're saying," it seems clear that the answer is (perhaps unsurprisingly): sort of . . . sometimes. However, while this question can't be answered simply in the affirmative or negative for current models, I hope I've done enough to show how we should approach this question, when thinking about future models. Whether or not we should treat a system as "understanding what it's saying" is not a matter of whether there is anything like the the processes going on inside of it that go inside our brains when we speak and understand what we're saying. It is, rather, a matter of whether it manifests an understanding of reason relations it binds itself by in saying what it does in responding to queries and challenges. Though, as the case of the ball in the cup illustrates, current models do not always manifest such an understanding, I do not see any reason in principle why future models, even those that are the result of nothing more than the scaling of the techniques of current models, could not. If they do, we would have every reason to say that those LLMs really do understand what they're saying.