Live Theory Part 0: Taking Intelligence Seriously

Sahil

Acknowledgements

The vision here was midwifed originally in the wild and gentle radiance that is Abram's company (though essentially none of the content is explicitly his).

The PIBBSS-spirit has been infused in this work from before it began (may it infuse us all), as have meetings with the Agent Foundations team at MIRI over the past ~2 years.

More recently, everyone who has been loving the High Actuation project into form (very often spontaneously and without being encumbered by self-consciousness of this fact):^[1] individuals include Steve Petersen, Mateusz Baginski, Aditya Prasad, Harmony, TJ, Chris Lakin; the AISC 2024 team, Murray Buchanan, Matt Farr, Arpan Agrawal, Adam, Ryan, Quinn; various people from Topos, ALIFE, MAPLE, MATS, EA Bangalore. Published while at CEEALAR.

Disclaimers

Very occasionally there are small remarks/questions from a remarkable human named Steve, since this and the next two posts are an edited transcript of me giving him a talk. I left them in to retain the conversational tone. Steve has also consistently been a fantastic ground for this channeling.
I use the term “artefact” a fair amount in this sequence. Unfortunately for you and me, Anthropic also recently started using "artifact" in a different way. I'm using "artefact" in the common sense of the word. The British spelling should help remind of the distinction.

Taking Intelligence Seriously

Sahil:

I gave a talk recently, at an EA event just two days ago, where I made some quick slides (on the day of the talk, so not nearly as tidy as I’d like) and attempted to walk through this so-called “live theory”^[2].

Maybe I can give you that talk. I'm not sure how much of what I was saying there will be present now, but I can try. What do you think? I think it'll take about 15 minutes. Yeah?

Steve:

Cool.

Sahil:

Okay, let me give you a version of this talk that's very abbreviated.

So, the title I’m sure already makes sense to you, Steve. I don't know if this is something that you know, but I prefer the word “adaptivity” over intelligence. I'm fine with using “intelligence” for this talk, but really, when I'm thinking of AI and LLMs and “live” (as you’ll see later), I'm thinking, in part, of adaptive. And I think that connotes much more of the relevant phenomena, and much less controversially.

It’s also less distractingly “foundational”, in the sense of endless questions on “what intelligence means”.

Failing to Take Intelligence Seriously

Right. So, I want to say there are two ways to fail to take intelligence, or adaptivity, seriously.

One is, you know, the classic case, of people ignoring existential risk from artificial intelligence. The old “well, it's just a computer, just software. What's the big deal? We can turn it off.” We all know the story there. In many ways, this particular failure-of-imagination is much less pronounced today.

But, I say, a dual failure-of-imagination is true today even among the “cognoscenti”, where we ignore intelligence by ignoring opportunities from moderately capable mindlike entities at scale. I'll go over this sentence slower in the next slide.

For now: there are two ways to not meet reality.

On the left of the slide is “nothing will change”. The same “classic” case of “yeah, what's the big deal? It's just software.”

On the right, it's the total singularity, of extreme unknowable super-intelligence. In fact, the phrase “technological singularity”, IIRC, was coined by Vernor Vinge to mark the point that we can't predict beyond. So, it's also a way to be mind-killed. Even with whatever in-the-limit proxies we have for this, we make various simplifications that are not “approximately” useful; they don’t decay gracefully. (Indeed, this is how the “high-actuation spaces” project document starts.)

All of the richness of reality: that’s in the middle.

Steve:

I think that makes sense. I like how there are both ways to avoid looking at the kind of intelligence in between, given slow takeoff.

Sahil:

Yeah, cool. In the event of a “slow takeoff”, sure.^[3]

Opportunities from Moderate Intelligence at Scale

Okay, so, going over the sentence slowly:

I mean opportunities, so the focus for today is less about risks. We're talking about reframing risks soon anyway, in the series of conversations we're having. However, conversation about risk will end up here as well, because it is an inextricable aspect of the opportunity frame. Indeed, “live theory” is about (research methodological) opportunities meeting risks (of infrastructural insensitivity towards beings).^[4]^[3]
“Moderate”, as in: intelligence that is not super intelligent or singularity-level intelligent, but kind of intelligent, the way LLMs seem kind of intelligent, and somewhat beyond.
“Intelligence” meaning, really, adaptivity again.
And at scale, meaning: everywhere, cheap, abundant, etc. The analogy I tend to use: attempts to talk about video calls, or even, say, remote teaching when all we currently have to extrapolate from is the telegraph/telegram. With the technology of the telegraph, people can tell that being able to send messages over large distances seems exciting. But they're not really thinking “oh, in the future, some jobs will entirely be dependent on being able to take video-calling tech for granted. Schoolkids might do the entirety of their schooling using this tech.” They just didn't think of that, for the most part. Even if some people had ideas, few were working towards orienting to that future in a way that would enable better equilibria. They failed to take it seriously.

Video calling is in a way the same type of technology as telegraph: the ability to send messages. But with incredibly reduced cost, latency plus increased adoption, bandwidth, fidelity. This allowed for remote work and video calls and the whole shebang that’s allowing us to have this conversation right now.

And so the question is: what happens when we have much lower latency and moderately more intelligence, much lower costs, with adaptivity tightly integrated in our infrastructure?^[5] Just like we have Wi-Fi in everything now.

And notice that this is not extrapolation that goes only moderately far. That is, just because I'm talking about “moderate” intelligence does not mean this extrapolation is not about the real crazy future up ahead. Only, this is AI-infrastructural extrapolation, not AI-singleton extrapolation (or even what’s connoted by “multipolar”, usually, IME). It’s neglected because it is usually harder to think about, than a relatively centralized thing we can hold in attention one at a time.

This frame also naturally engages more with the warning carried in “attitudes to tech overestimate in the short run, underestimate in the long run.”

So to repeat, and this is important for sensemaking this: I am doing extrapolation that will venture far. What follows is simultaneously very obvious and very weird.^[6] In fact, that combination is what makes it work at all, as you’ll see.

But that’s also a double-edged sword. Instead of it being sensible (obvious) and exciting (weird), the perspective here might seem redundant or boring (too obvious) and irrelevant or scary (too weird).

Hopefully, I will:
a) avoid its obviousness being rounded off to “ah right, live theory is another word for autoformalization” and
b) bring its weirdness closer to digestibility. To quote Bostrom in Deep Utopia, “if it isn’t radical, it isn’t realistic.”

So even though some might classify this series as being about "automating^[7] alignment research", it tastes nothing like unfleshed mundane trendlines^[8] or spectacular terror that are mixed together, for example, in Leopold Aschenbrenner’s “AGI from automated research by 2027”.

Again, this isn't to say that there aren't some serious risks, only that they might look very different (which view will be elaborated in an upcoming post).^[9]

Live Interfaces

This slide was ‘Live UI’ (don't bother trying to read the image): what happens to infrastructure and interfaces, generally, when you can “do things that don’t scale”, at scale. People don’t seem to update hard enough, based on their reactions at Sora etc, on what the future will take for granted.

What is possible, when all this is fast, cheap, abundant, reliable, sensitive? Live UI seeks to chart this out. The six pillars, without much explanation for now, are:

Tailormade media (eg. instead of writing a post and then distributing it, you can distribute a post-prompt that can take into account the user’s preferences and context and become the relevant post in interaction with the user.)
Live interoperation (eg.^[10] tailormade movies do not imply loneliness of the viewer, because you’ll also have tailormade interoperation attentively “translating” insights you had watching your movie to remarks relevant to your friend’s movie.)
Live differential privacy (eg. automatically^[7] and seamlessly replace your friends in a livestream who don’t want their info up, with randomly generated faces that capture the vibe without any possibility of inference to their details, handled by an AI that understands your circumstances and privacy preferences informally, with occasional oversight.)
Recording & representatives (eg. record your data to allow for better tailoring to you, and even stream it out, to the extent that your privacy is covered by previous pillar, and credit and relevance is covered by the next two pillars.)
Live storage (eg. personal knowledge management engines like Roam complemented with mental-context management that cycle your mental context with your own important notes and various data feeds, sensitively refreshed to match your current momentum/motivations.)
Telic reliability & live attribution [not pictured above] (eg. the data/prompts/wishes you livestream are associated with a cycling schedule that the reader likes, in the reader’s live storage, which helps automatically^[5] sense the contextual reputation assigned to your creativity.)

Steve:

Am I supposed to be following all of that?

Sahil:

Definitely not, it's just a trailer of sorts.^[11] I've included only a relatively accessible example each (it gets way weirder), but there are volumes to say for the pillars, especially to gesture at how it all works together. (Btw: reach out if you're interested to know more or work together!)

A bit more, before we move on though.

Nearly all of the above is about sensitivity and integration^[12] as we gain increasing choice in adaptive constructability at scale.

The above could, instead, already sound nauseating, atomizing, and terrifying. A good time, then, to meet the key concern of the High Actuation agenda: the applied metaphysics of cultivating wise and lasting friendship in a reality full of constructs.

High Actuation is the research context (for both Live UI & Live Theory, among several other tracks), where the physical world becomes more and more mindlike through increased and ubiquitous adaptivity. In the process, challenging a lot of (dualistic^[13]) assumptions about mind and matter, and how to go about reasoning about them.^[14]

But yeah, don't worry about this interface-6-pillars stuff above. I'm going to talk about what I’ll be focusing on building tools (and hiring!) for, in the coming months: intelligent science infrastructure.

Live Theory (preface)

So the boring way to think about intelligent science infrastructure is to say “AI will do the science and math automatically.”

(What does it really mean, to say “automatically”? We’ll get to that.)

First, a succinct unrolling of the whole vision. A series of one-liners follow, with increasing resolution on the part of this dream that matters. The italics signify the emphasis in each stage of clarification.

Live theory is...

Adaptive theories

(Here "theories" are only one kind of distributable artefact in research to think about.)

But more importantly, the vision is...

Artefacts with adaptivity beyond formal variables/parameters

(Here "artefacts" includes papers, theories, explainers, code, interfaces, laws^[15], norms etc.)

But more importantly, the vision is...

Protocols for exchange of “postformal” adaptive artefacts that you can take for granted, and focus on production/consumption instead of the interface.

(Here "protocols" don’t need to be a fixed/formal protocol or platform either!)

But really, the vision is...

Lending spirit to an ecosystem of research that can competently caretake the widescale-exchange of postformal adaptive artefacts
- ...so that we collectively refine sensitivity towards beings, in the AI-replete infrastructure we seem to be heading towards.

(Here “lending spirit” being the crucial activity that allows for spacious resolution towards better equilibria.)

IOW: The slower-but-more-significant pace layer of infrastructure, that supports the pace layer of commerce.

Some navigation before the longer conversation around this: there are four gates that we'll have to pass through, as you see on the slide below, to come to terms with the proposal.

Each of the four gates has been invariably challenging to convey and invite people through (although it is satisfying to witness people's fluency after a few runs through them):

Possible (what Live Theory is and whether it is even conceivable/tractable)
Bearable (how, paradoxically, Live Theory has always been true, and whether we can stand it being true already)
Desirable (what new modes open up when we incorporate Live Theory-like adaptivity)
Necessary (Live Theory's suggestions for threat models from extremely adaptive AI, like in Deep Deceptiveness and Robust Agent Agnostic Processes.^[16])

I'm going to walk through these gates, and conclude, and that will be the talk.

This decomposition into gates should hopefully make it easier to critique the whole thing—eg. “X aspect is undesirable because…” vs “X aspect is impossible because…”. The two obviously deserve very different kinds of responses.

I’m offering the talk this way for many reasons, but to say a bit more. Most of the work is in a) noticing the circumrational water of mathematics as it is today (which can be too obvious for mathematicians and too unbearable for math-groupies respectively) and b) connecting it to mathematics as it might become in the near future (which can seem too bizarre or undesirable, if you don’t notice its importance in mathematics as it is today). When new paradigms start being Steam’d, they often have to pull off a similar straddling of the familiar and the unfamiliar. Not too different from the ordeal of hitting the edges of one’s developmental stage… but at a sociotechnical/civilizational level.

If making it easy for you to respond and critique were the only goal of the gates, they would have been set out in a tidier decomposition. However, in tandem, I'm using the gates-structure to construct a “natural walk”, a narrative arc, through the details of Live Theory. This polytely has some tradeoffs (such as the Bearable and Desirable gates not quite disentangling), but I think it works! Let me know.

The next post will cover the first two gates. A teaser slide for the first one follows.

Live Theory (teaser)

Possibility

And a teaser of two questions I will start the next post with, but also include now to give you time to actually think:

Q. Mathematical (or theoretical) generalization seems to involve creating a parameter or variable, where "specialization" happens by substituting with a value later. Is there an alternative?
Q. What is generalization really for? What does it offer you?

^{^}
Challenging and supporting, especially through the frustration of freezing this written artefact before I can avail the spaciousness of the fluidic era.
^{^}
Alternative terms include “adaptive theory”, “fluid theory”, "flexible theory"; where the theories themselves are imbued with some intelligence
^{^}
This will be elaborated in an upcoming post on risks. “Slow takeoff” is an approximation that will have to do for now, but really it's much more like having lazy or mentally ill AI. If you're curious, here's the abstract:
Many AI safety folk have rightly raised issues of corrigibility. That once the target(/values) of an optimization process get locked in, it becomes extremely sticky. Attempts to change are up against the full optimization power behind fulfilment of the target. If this power goes beyond our abilities, any clever strategies to undermine it are just an enactment of our weaker optimization powers pitted against a stronger one, and will, tautologically, fail.
We aim here to bring to fore a subtler version of this issue: corrigibility of relevance. Where stickiness of value comes about not via strong monomaniacal attraction to the target, but by indifference to anything outside of what is considered meaningful. Indifference not as a side effect of strong optimization (such as in The AI knows but does not care), but by not entering attention. Indeed, this applies even in the absence of an overall optimizing process. We also point out how, contrary to intuitions that this makes AI more dangerous, it could also imply less conflict. We point out that optimization processes that occur within a zone of relevance are less likely to be rivalrous with others, contra broad instrumental convergence. We carefully argue that difficulty of corrigibility of relevance implies, further, that zones of relevance are unlikely (though not impossible) to expand, since there is no motivation to.
We also couch zones of relevance in naturalistic terms. This involves a new concept, of integrity, which is an embedded extension of coherence. The importance of the specific, physical modalities of "skin-in-the-game" to relevance is explored, via connections between physical integrity and mental integrity. We take up very concrete questions (such as "are hardware level architectural changes needed in order for shutdown to be a real problem?") from this lens, and argue for a fractal consequentialist model. This makes the safety-vs-capability dichotomy untenable (in a new way), giving rise, for example, to "fractal sharp left turn". We list many predictions of this model that are opposite to the original sharp left turn threat model, and gives rise to novel dangers that resemble S-risks from partial alignment.
Owing to issues of integrity, "simply" hooking up an intelligent, "unagentic" model with an agentic wrapper (eg. "GPT-7 may not be dangerous, but GPT-7 plus two thousand lines of python can destroy the world") is shown to be a lot more difficult. We explore, with examples, how simply "plugging in" does not lead to integration, and this lack of integrity creates lacunae of relevance that can be exploited to cause shutdown, no matter the cognitive powers of the machine. We cautiously make analogies to lack of integrity in humans (often manifesting as mental disorders, dissociation, and savantism). We note that disanalogies are more important because of differences in physical make-up of biological life vs machines, and how that gives us more time to deal with dangerous versions of the problems of shutdown / risks of autonomy. We end by attempting to point at the human/animal zone of relevance, which is quite philosophically challenging to do from the vantage of our own minds, but extremely important as we create minds very different from ours.
^{^}
Generally, it is a bit suspect to me, for an upcoming AI safety org (in this time of AI safety org explosion) to have, say, a 10 year timeline premise without anticipating and incorporating possibilities of AI transforming your (research) methodology 3-5 years from now. If you expect things to move quickly, why are you ignoring that things will move quickly? If you expect more wish-fulfilling devices to populate the world (in the meanwhile, before catastrophe), why aren't you wishing more, and prudently? An “opportunity model” is as indispensable as a threat model.

(In fact, "research methodological IDA" is not a bad summary of live theory, if you brush under the rug all the ontological shifts involved.)
^{^}
TJ comments:
I'd guess you'd want to be more explicit here that what you mean is that "human cognition, thought, and intellectual enterprise is itself going to go through radical transitions, and will reshape the human experience." It feels like you're implying this here, but is left as a silent gesture for the reader.
^{^}
An example of weirdness if you're hungry, in raw form. Inspired by a submission at a hackathon on live machinery.
Time structures are pretty centralized right now.
Do you know what a "Thurs" is? Would you collect solar days together into 7-length segments? Why is some old drama between Julius and Augustus deciding the day you go out and have fun wearing spooky costumes?
You could imagine throwing it all away and redoing it in a way that makes sense to you. Or better yet, to your local community. Maybe you would divide the day into powers of two and label them. Perhaps you like to orient to the lunar calendar, or to your menstrual cycle. You might find seasons to be only vaguely relevant if you're in the Maldives or elsewhere near the equator, and might orient via tides and monsoon or the migration patterns of whales.
But this is all a fantasy. If you need to feed you kids, you need to believe in a Thursday. You have your weekly standup on Thursdays, at the eleventh hour of the ante-meridian. If we want to coordinate, we need protocol. You need to catch your flight, and the pilot needs to show up on time. We need shared structure, shared reality, universal languages. Right?
The punchline, of course, is that this is the old way to scale. By replicating fixed structure. But when intelligence is cheap, you don't need static shared structure. Ubiquitous attentive infrastructure can create peer-to-peer structure as needed. So how would that work?
Let's take something simple: a postformal, prayer-based interface to replace calendly.
Instead of marking lines on a shared spatialization of time, your community can live by your local temporal rituals. You might use the rings of trees to commemorate new beginnings, or the births of children. Perhaps you will have a season of meeting others that lasts several weeks, based on your bipolar rhythms.
This is your calendly. You broadcast prayers to meet others during this time. Ambient intelligent infrastructure (ie. AI) can help you identify, broadcast and match this prayer with mine, since I want about 26 minutes of meeting time with you, as expressed in my wishes, to consult about durian-eating or whatever.
The flexibility of prayers allows for you to synchronously run into me during your meeting season. You (and I) get more lucky, rather than getting more controlling-power.
You can still have precision, in this. Your prayer might specify that. Control and precision might seem hard to decouple. But so do scale and fixed structure.
AI, if done right, can help with more interpersonalization, not hyperpersonalization. Far from hallucination and spam, it can lead to more unconforming to embodied truth, rather than conforming to epistemic bureacracy. Why should our cycles and seasons of living and meeting be filtered through rigid structure that is numb to our needs and infiltrates our meanings?
Just picking up a preferred grain of abstraction or redoing some grids creatively (as in xkcd: 28-Hour Day) is nice, but it doesn't free you to co-create the background view of your reality.
^{^}
(See more here: The Logistics of Distribution of Meaning )
The word “automation” does not distinguish numb-scripted-dissociated from responsive-intelligent-flowing. It usually brings to mind the former. So I avoid using it. More on this in the next post, but for now, a word might be “competence”. When you’re really in flow while dancing, you're not thinking. But it is the opposite of “automation”. If anything, you're more attuned, more sensitive.
This isn't “high-level” abstraction either. The high-low dichotomy is the good-old-fashioned way of looking at generalization, and does not apply with expanded sensitivity, as we'll see in this sequence.
The relevance of the choice "live" will hopefully also become clearer as we go. Meant to connote biological rather than machine metaphors, dialectics rather than deliverables, intentional stance, though not necessarily patienthood.
^{^}
It's not that people predict "no fundamentally new ways of looking at the world will occur" and therefore decide to exclude them in their stories. I think people exclude ontological shifts because it's very hard to anticipate an ontological shift.

(If you disagree, I'd ask the fundamental question of implied invisible work: what have you rejected? A la book recommendations.)
^{^}
Matt adds:
It might be worth stating more about these [risks] from the outset. Just to prevent anyone thinking you're some kind of e/acc full steamer techbro who isn't interested in safety
Indeed. The Necessary gate will cover much relevance to threat models. Apart from that, expect much much more on "infrastructural insensitivity" to articulate further vectors of risk.
^{^}
Abram helpfully adds:
I find the meeting example [see next quote block] much more compelling, where people get dynamically crafted versions of the meeting which can catch them up on the most important things they missed as a result of coming in late -- someone uses the phrase "lightbulb moment" in a context-dependent way, so the live interop unpacks the context for those who missed it.

I think the reason I like this better is that meetings invoke a more utilitarian mindset where we evaluate the usefulness of tools. Movies invoke a warm friendly mindset where we are more prone to horror over the seeming non-togetherness of the described scenario. If you want to shock people with the idea of such severely mediated reality, fine. But I think the meeting example will provoke less feeling of shock and undesirability, if you prefer that.
And so here is my meeting example pasted raw, which might be more palatable or more triggering:
So imagine you join a meeting on Zoom, a little late. AI summarizers can "catch you up". Instead of immediately chatting, you peruse the transcript/read the summary.
Now that's great, but a little boring. Ideally you'd have "immersive summary", a mini-meeting that you get to see live, a generated video summary. You watch that, and then join the real meeting.
Then you seamlessly transfer to the "real" meeting... except when Alice references a detail that wasn't given enough attention in the summary-virtual-meeting you initially watched to catch up. So before you get confused as Alice starts using the details assuming you were in the meeting, the AI interop again pulls you back into a simulation where generated Alice says "so, just to recall, some details I want to add" before dropping you back into reality.
And in fact, your replica will already have joined the meeting even though you were 10 minutes late. And you watch it, and might say "oh, no, that's not what I would have said"... and so, once you say that, everyone else will be put in a simulated meeting, where your replica says "I know I said earlier that X, but no, [I take that back/actually meant/wanted to say] ..." so everyone is again caught up.
Rather than being blocky or jarring, imagine a perfect stabilizer, like in a drone or in video-correction. It's uncanny, that despite all the noises, rapid responses can correct destabilizations quite magically and beautifully into a coherence.
This again blurs "construction" and "reality", doing such a fantastically better job that you will only occasionally need to rebel against this matrix. The real-constructed dichotomy makes less and less sense.
^{^}
More here: Live Machinery: An Interface Design Philosophy for Wholesome AI Futures
^{^}
NB. this is not brute, ham-fisted merging, for those worried. See also: Unity and diversity.
^{^}
also: centralization-heavy, preformationist, foundationalist, control-oriented. See The Logistics of Distribution of Meaning: Against Epistemic Bureaucratization
^{^}
This is a very expensive delivery mechanism for a koan/pointing-out instruction, to let go of the static-machine mythos, but there you go.
^{^}
Live Governance post coming up soon.
^{^}
See also this proposal for the commonalities in the two posts and limitations of existing approaches.

AI ALIGNMENT FORUM
AF

42

Live Theory Part 0: Taking Intelligence Seriously

42

Acknowledgements

Disclaimers

Taking Intelligence Seriously

Failing to Take Intelligence Seriously

Opportunities from Moderate Intelligence at Scale

Live Interfaces

Live Theory (preface)

Navigation

Live Theory (teaser)

Possibility