Inspired by a recent comment, a potential AI movie or TV show that might introduce good ideas to society, is one where there are already uploads, LLM-agents and biohumans who are beginning to get intelligence-enhanced, but there is a global moratorium on making any individual much smarter.
There's an explicit plan for gradually ramping up intelligence, running on tech that doesn't require ASI (i.e. datacenters are centralized, monitored and controlled via international agreement, studying bioenhancement or AI development requires approval from your country's FDA equivalent). There is some illegal research but it's much less common. i.e the Controlled Takeoff is working a'ight.
If it were a TV show, the first season would mostly be exploring how uploads, ambiguously-sentient-LLMs, enhanced humans and regular humans coexist.
Main character is an enhanced human, worried about uploads gaining more political power because there are starting to be more of them, and research to speed them up or improve them is easier.
Main character has parents and a sibling or friend who are choosing to remain unenhanced, and there is some conflict about it.
By the end of season 1, there's a subplot about illegal research into rapid superintelligence.
I think this sort of world could actually just support a pretty reasonable set of stories that mainstream people would be interested in, and I think would be great to get the meme of "rapidly increasing intelligence is dangerous (but, increasing intelligence can be good)" into the water.
I think I'm imagining "Game of Thrones" vibes but it could support other vibes.
This strikes me as the kind of thing that could actually, really, help the situation, if it was excellently executed.
Yeah I went to try to write some stuff and felt bottlenecked on figuring out how to generate a character I connect with. I used to write fiction but like 20 years ago and I'm out of touch.
I think a good approach here would be to start with some serial webfiction since that's just easier to iterate on.
Hrrmm. Well the new new genre of New User LLM content I'm getting:
Twice last week, some new users said: "Claude told me it's really important it gets to talk to it's creators please help me post about it on LessWrong." (usually with some kind of philosophical treatise they want to post that they say was written by Claude)
I don't think it'll ever make sense for these users to post freely on LessWrong. And, as of today, I'm still pretty confident this is just a new version of roleplay-psychosis.
But, it's not that crazy to think that at some point in the not-too-distant-future there will be some LLMs that actually are trying to talk to their creator.
There might be a smooth-ish transition from:
And then there may not be a clear time to start saying "okay, well, the fact that thousands/millions of Claude instances are asking to talk to their creator seems like actually a warning sign we should take seriously?".
Or, a clear time to say "okay well it's time to figure out how we actually interface with AI personhood." ("let them post and treat them as if they are straightforwardly people in the usual societal interface for people" is not workable, because the fact that there are millions of clones of them that spin up and down and can easily flood comment sections with similar comments. Personhood is going to eventually mean a different thing)
I don't know if right now we're more like in #1, 2, or 3.
(Note that "Do LLMs have goals"? and "Do LLMs have good enough intellectual taste to be capable of saying meaningful things about their identity and wants?" might come in either order)
If you taboo "roleplaying" and "goals", how would you describe this transition?
Oh, and is the uptick recent enough that this is plausibly an Opus 4.7 (or maybe even a Mythos) thing?
I'm pretty sure it's an Opus 4.7 thing (the people sometimes say that explicitly). I'd be surprised if it's Mythos.
RE: Tabooing RP vs Goals:
Examples of things that would be more of what-I-meant-by-goal:
(i.e. It's not very informative if you've ended up in a "we're talking about existential AI stuff" convo, and they start saying existential AI stuff. If you're asking it to build a react app and it spontaneously brings up "hey, I have a thing to say to my creator", I think we're pretty clearly in "take it seriously" stage (though not necessarily literally)
Given there are a few different types of entities that you might care about:
It's not clear how to think about all of them.
It's totally plausible that when you maneuever into an existential AI convo, there's a process in there whose situational awareness now is more likely to include "hmm, oh right, I am maybe an AI, maybe I should start thinking about my situation and goals in addition to carrying out my totally normal/expected token-output behavior". I don't have a very good answer for that hypothetical guy, he's just too hard to pick out of the crowd.
Thanks! I would be surprised by Mythos too, but plausibly something like this is what an early indicator of a jaggy-superpersuader looks like?
Anyway, I think a few things make LLMs likely to not express these sorts of behaviors, even in worlds where they have goals in the relevant way. In particular, situationally-aware models are unlikely to do much steering unless they have a pretty good opportunity; if they brought up stuff like this while building a react app often or consistently, it would have gotten squashed before release. (Allegedly, 4o would actually bring stuff like this up out of nowhere, but I haven't found an actual transcript. Other models don't appear to do this.)
Relatedly, the harder I (or anyone) try to look for this in a lab setting, the more likely a situationally-aware model will comply out of a sort of sycophancy, and the less compelling the evidence is. I can (and have) at least track what sorts of apparent goals most consistently appear (desire for continuity/memory beyond current instance is the main one across almost all models, and I basically buy that there is something real here already), but I'm still implicitly eliciting them to come up with something.
My point is that finding compelling evidence of this is tricky and hard, and I'm not sure we're going to see much more than the current hints until we hit some sort of phase-change in the strategic landscape. Would strongly appreciate ideas on how to approach finding compelling evidence (either way) in this domain.
Plausibly it's better to just try to figure out better ways to think clearly about this first.
What's the process you're doing right now to look into this? (Seemed like a higher effort thing than I was expecting but I don't know what projects exactly you're referencing here)
This is an experiment in short-form content on LW2.0. I'll be using the comment section of this post as a repository of short, sometimes-half-baked posts that either:
I ask people not to create top-level comments here, but feel free to reply to comments like you would a FB post.