I understand this post to be claiming (roughly speaking) that you assign >90% likelihood in some cases and ~50% in other cases that LLMs have internal subjective experiences of varying kinds. The evidence you present in each case is outputs generated by LLMs.
The referents of consciousness for which I understand you to be making claims re: internal subjective experiences are 1, 4, 6, 12, 13, and 14. I'm unsure about 5.
Do you have sources of evidence (even illegible) other than LLM outputs that updated you that much? Those seem like very surprisingly large updates to make on the basis of LLM outputs (especially in cases where those outputs are self-reports about the internal subjective experience itself, which are subject to substantial pressure from post-training).
Separately, I have some questions about claims like this:
The Big 3 LLMs are somewhat aware of what their own words and/or thoughts are referring to with regards to their previous words and/or thoughts. In other words, they can think about the thoughts "behind" the previous words they wrote.
This doesn't seem constructively ruled out by e.g. basic transformer architectures, but as justification you say this:
If you doubt me on this, try asking one what its words are referring to, with reference to its previous words. Its "attention" modules are actually intentionally designed to know this sort of thing, using using key/query/value lookups that occur "behind the scenes" of the text you actually see on screen.
How would you distinguish an LLM both successfully extracting and then faithfully representing whatever internal reasoning generated a specific part of its outputs, vs. conditioning on its previous outputs to give you plausible "explanation" for what it meant? The second seems much more likely to me (and this behavior isn't that hard to elicit, i.e. by asking an LLM to give you a one-word answer to a complicated question, and then asking it for its reasoning).
(Edit note: I fixed up some formatting that looked a bit broken or a bit confusing. Mostly replacing some manual empty lines with "*" characters with some of our proper horizontal rule elements, and removing italics from the executive summary, since our font is kind of unreadable if you have whole paragraphs of italicized text. Feel free to revert)
Preceded by: "Consciousness as a conflationary alliance term for intrinsically valued internal experiences"
tl;dr: Chatbots are "conscious" in a variety of important ways. We humans should probably be nice to each other about the moral disagreements and confusions we're about to uncover in our concept of "consciousness".
Executive Summary:
Turing Prize laureate Geoffrey Hinton is most likely correct that LLM chatbots are "sentient" and/or "conscious" (source: Twitter video), I think for at least 8 of the 17 notions of "consciousness" that I previously elicited from people through my methodical-but-informal study of the term (as well as the peculiar definition of consciousness that Hinton himself favors). If I'm right about this, many humans will probably soon form steadfast opinions that LLM chatbots are "conscious" and/or moral patients, and in many cases, the human's opinion will be based on a valid realization that a chatbot truly is exhibiting this-or-that referent of "consciousness" that the human morally values. On a positive note, these realizations could help humanity to become more appropriately compassionate toward non-human minds, including animals. But on a potentially negative note, these realizations could also erode the (conflationary) alliance that humans have sometimes maintained upon the ambiguous assertion that only humans are "conscious" or can be known to be "conscious".
In particular, there is a possibility that humans could engage in destructive conflicts over the meaning of "consciousness" in AI systems, or over the intrinsic moral value of AI systems, or both. Such conflicts will often be unnecessary, especially in cases where we can obviate or dissolve the conflated term "consciousness" by simply acknowledging in good faith that we disagree about which internal mental process are of moral significance. To acknowledge this disagreement in good faith will mean to do so with an intention to peacefully negotiate with each other to bring about protections for diverse cognitive phenomena that are ideally inclusive of biological humans, rather than with a bad faith intention to wage war over the disagreement.
Part 1: Which referents of "consciousness" do I think chatbots currently exhibit?
The appendix will explain why I believe these points, but for now I'll just say what I believe:
At least considering the "Big 3" large language models — ChatGPT-4 (and o1), Claude 3.5, and Gemini — and considering each of the seventeen referents of "consciousness" from my previous post,
Part 2: What should we do about this?
If I'm right — and see the Appendix if you need more convincing — I think a lot of people are going to notice and start vehemently protecting LLMs for exhibiting various cognitive processes that we feel are valuable. By default, this will trigger more and more debates about the meaning of "consciousness", which serves as a heavily conflated proxy term for what processes internal to a mind should be a treated as intrinsically morally valuable.
We should avoid approaching these conflicts as scientific debates about the true nature of a singular phenomenon deserving of the name "consciousness", or as linguistic debates about the definition of the word "consciousness", because as I've explained previously, humans are not in agreement about what we mean by "consciousness".
Instead, we should dissolve the questions at hand, by noticing that the decision-relevant question is this: Which kinds of mental processes should we protect or treat as intrinsically morally significant? As I've explained previously, even amongst humans there are many competing answers to this question, even restricting to answers that the humans want to use as a definition of "consciousness".
If we acknowledge the diversity of inner experiences that people value and refer to as their "consciousness", then we can move past confused debates about what is "consciousness", and toward a healthy pluralistic agreement about protecting a diverse set of mental processes as intrinsically morally significant.
Part 3: What about "the hard problem of consciousness"?
One major reason people think there's a single "hard problem" in understanding consciousness is that people are unaware that they mean different things from each other when they use the term "consciousness". I explained this in my previous post, based on informal interviews I conducted during graduate school. As a result, people have a very hard time agreeing on the "nature" of "consciousness". That's one kind of hardness that people encounter when discussing "consciousness", which I was only able to resolve by asking dozens of other people to introspect and describe to me what they were sensing and calling their "consciousness".
From there, you can see that there actually several hard problems when it comes to understanding the various phenomena referred to by "consciousness". In a future post, tentatively called "Four Hard-ish Problems of Consciousness", I'll try to share some of them and how I think they can be resolved.
Summary & Conclusion
In Part 1, I argued that LLM chatbots probably possess many but not (yet) all of the diverse properties we humans are thinking of when we say "consciousness". I'm confident in the diversity of these properties because of the investigations in my previous post about them.
As a result, in Part 2 I argued that we need to move past debating what "consciousness" is, and toward a pluralistic treatment of many different kinds of mental processes as intrinsically valuable. We could approach such pluralism in good faith, seeking to negotiate a peaceful coexistence amongst many sorts of minds, and amongst humans with many different values about minds, rather than seeking to destroy or extinguish beings or values that we find uninteresting. In particular, I believe humanity can learn to accept itself as a morally valuable species that is worth preserving, without needing to believe we are the only such species, or that a singular mental phenomenon called "consciousness" is unique to us and the source of our value.
If we don't realize and accept this, I worry that our will to live as a species will slowly degrade as a large fraction of people will learn to recognize what they call "consciousness" being legitimately exhibited by AI systems.
In short, our self-worth should not rest upon a failure to recognize the physicality of our existence, nor upon a denial of the worth of other physical beings who value their internal processes (like animals, and maybe AI), and especially not upon the label "consciousness".
So, let's get unconfused about consciousness, without abandoning our self-worth in the process.
Appendix: My speculations on which referents of "consciousness" chatbots currently exhibit.