All of Eric Drexler's Comments + Replies

I agree that using the forms of human language does not ensure interpretability by humans, and I also see strong advantages to communication modalities that would discard words in favor of more expressive embeddings. It is reasonable to expect that systems with strong learning capacity could to interprete and explain messages between other systems, whether those messages are encoded in words or in vectors. However, although this kind of interpretability seems worth pursuing, it seems unwise to rely on it.

The open-agency perspective suggests that while inte... (read more)

This makes sense, but LLMs can be described in multiple ways. From one perspective, as you say,

ChatGPT is not running a simulation; it's answering a question in the style that it's seen thousands - or millions - of times before.

From different perspective (as you very nearly say) ChatGPT is simulating a skewed sample of people-and-situations in which the people actually do have answers to the question.

The contents of hallucinations are hard to understand as anything but token prediction, and by a system that (seemingly?) has no introspective knowledge of it... (read more)

Are you arguing that increasing the number of (AI) actors cannot make collusive cooperation more difficult? Even in the human case, defectors make large conspiracies more difficult, and in the non-human case, intentional diversity can almost guarantee failures of AI-to-AI alignment.

What you describe is a form of spiral causation (only seeded by claims of impossibility), which surely contributes strongly to what we’ve seen in the evolution of ideas. Fortunately, interest in the composite-system solution space (in particular, the open agency model) is now growing, so I think we’re past the worst of the era of unitary-agent social proof.

Nate [replying to Eric Drexler]: I expect that, if you try to split these systems into services, then you either fail to capture the heart of intelligence and your siloed AIs are irrelevant, or you wind up with enough AGI in one of your siloes that you have a whole alignment problem (hard parts and all) in there….

GTP-Nate is confusing the features of the AI services model with the argument that “Collusion among superintelligent oracles can readily be avoided”. As it says on the tin, there’s no assumption that intelligence must be limited. It is, inste... (read more)

The question then becomes, as always, what it is you plan to do with these weak AGI systems that will flip the tables strongly enough to prevent the world from being destroyed by stronger AGI systems six months later.

Yes, this is the key question, and I think there’s a clear answer, at least in outline:

What you call “weak” systems can nonetheless excel at time- and resource-bounded tasks in engineering design, strategic planning, and red-team/blue-team testing. I would recommend that we apply systems with focused capabilities along these lines to help us d... (read more)

So I think that building nanotech good enough to flip the tables - which, I think, if you do the most alignable pivotal task, involves a simpler and less fraught task than "disassemble all GPUs", which I choose not to name explicitly - is an engineering challenge where you get better survival chances (albeit still not good chances) by building one attemptedly-corrigible AGI that only thinks about nanotech and the single application of that nanotech, and is not supposed to think about AGI design, or the programmers, or other minds at all; so far as the best... (read more)

Mind space is very wide

Yes, and the space of (what I would call) intelligent systems is far wider than the space of (what I would call) minds. To speak of “superintelligences” suggests that intelligence is a thing, like a mind, rather than a property, like prediction or problem-solving capacity. This is which is why I instead speak of the broader class of systems that perform tasks “at a superintelligent level”. We have different ontologies, and I suggest that a mind-centric ontology is too narrow.

The most AGI-like systems we have today are LLMs, optimized... (read more)

So I think useful arrangements of diverse/competing/flawed systems is a hope in many contexts. It often doesn't work, so looks neglected, but not for want of trying.

What does and doesn’t work will depend greatly on capabilities, and the problem-context here assumes potentially superintelligent-level AI.

These are good points, and I agree with pretty much all of them. 

Instances in a bureaucracy can be very different and play different roles or pursue different purposes. They might be defined by different prompts and behave as differently as text continuations of different prompts in GPT-3

I think that this is an important idea. Though simulators analogous to GPT-3, it may be possible to develop strong, almost-provably-non-agentic intelligent resources, then prompt them to simulate diverse, transient agents on the fly. From the perspective of building m... (read more)

I don’t see that as an argument [to narrow this a bit: not an argument relevant to what I propose]. As I noted above, Paul Christiano asks for explicit assumptions.

To quote Paul again:.

I think that concerns about collusion are relatively widespread amongst the minority of people most interested in AI control. And these concerns have in fact led to people dismissing many otherwise-promising approaches to AI control, so it is de facto an important question

Dismissing promising approaches calls for something like a theorem, not handwaving about generic “smart ... (read more)

I don't think you're going to see a formal proof, here; of course there exists some possible set of 20 superintelligences where one will defect against the others (though having that accomplish anything constructive for humanity is a whole different set of problems).  It's also true that there exists some possible set of 20 superintelligences all of which implement CEV and are cooperating with each other and with humanity, and some single superintelligence that implements CEV, and a possible superintelligence that firmly believes 222+222=555 without t... (read more)

HCH-like amplification seems related, multiple unreliable agent instances assembled into a bureaucracy that as a whole improves on some of their qualities

This approach to amplification involves multiple instances, but not diverse systems, competing systems, different roles, adversarial relationships, or a concern with collusion. It is, as you say a training story for a single AI. Am I missing a stronger connection?

2Vladimir Nesov
Instances in a bureaucracy can be very different and play different roles or pursue different purposes. They might be defined by different prompts and behave as differently as text continuations of different prompts in GPT-3 (the prompt is "identity" of the agent instance, distinguishes the model as a simulator from agent instances as simulacra). Decision transformers with more free-form task/identity prompts illustrate this point, except a bureaucracy should have multiple agents with different task prompts in a single episode. MCTS and GAN are adversarial and could be reframed like this. One of the tentative premises of ELK is that a model trained for some purpose might additionally allow instantiating an agent that reports what the model knows, even if that activity is not clearly related to the model's main purpose. Colluding instances inside a bureaucracy make it less effective in achieving its goals of producing a better dataset (accurately evaluating outcomes of episodes). So I think useful arrangements of diverse/competing/flawed systems is a hope in many contexts. It often doesn't work, so looks neglected, but not for want of trying. The concern with collusion in AI risk seems to be more about deceptive alignment, observed behavior of a system becoming so well-optimized that it ceases to be informative about how it would behave in different situations. Very capable AIs can lack any tells even from the perspective of extremely capable observers, their behavior can change in a completely unexpected way with changing circumstances. Hence the focus on interpretability, it's not just useful for human operators, a diverse system of many AIs also needs it to notice misalignment or collusion in its parts. It might even be enough for corrigibility.

Yes, we need to weigh the considerations you mention, and many others.

I think copy-collusion is reasonably likely, and we don't currently know of a way to prevent it.

Copy-collusion must be considered carefully, and its implications will depend on both potential contexts for using copies (sometimes problematic, sometimes OK) and on potential alternatives to using copies.

Copies will always “collude” in the trivial sense of giving identical, potentially deceptive responses to identical inputs, but might also offer potentially deceptive, mutually-supporting re... (read more)

From the Afterword: “Note that I have presented a there-exists argument against a for-all proposition. Responses of the form “But I can think of something similar that won’t work” don’t count as counterarguments.” [This has been edited in the main text to sharpen the phrasing.]

B2: Copies might collude with one another, provided they have a broad decision theory: ∃(some sets of models) among which some mechanisms for impeding collusion might not work.

B4: A lot of models may be deployed over the internet: ∃(some communication channels) for which one of 6 mec... (read more)

3Adam Jermyn
I saw this, but I think it sets a somewhat unhelpful standard. In practice we need to make choices about which approaches are most promising, which to pursue, etc., and evidence that there is more probability mass on success in one area does feel useful.  So, for instance, my point in B2 is (as you correctly interpreted) not that there literally can't exist models that don't exhibit copy-collusion. Rather, I think copy-collusion is reasonably likely, and we don't currently know of a way to prevent it.

I’d like to promote a norm for proposals for alignment techniques to be very explicit about where the hard work is done, i.e. which part is surprising or insightful or novel enough to make us think that it could solve alignment even in worlds where that’s quite difficult.

Alignment is, by nature, an engineering task, not a scientific task: It is an attempt to make something, not to understand some existing thing. It may be that, as you suggest, “solving hard scientific problems usually requires compelling insights”, but this is beside the point. Spacef... (read more)

Although I don’t understand what you mean by “conservation of computation”, the distribution of computation, information sources, learning, and representation capacity is important in shaping how and where knowledge is represented.

The idea that general AI capabilities can best be implemented or modeled as “an agent” (an “it” that uses “the search algorithm”) is, I think, both traditional and misguided. A host of tasks require agentic action-in-the-world, but those tasks are diverse and will be performed and learned in parallel (see the CAIS report, www.fhi... (read more)

How exactly does the terminal goal of benefiting shareholders disappear[…]

But does this terminal goal exist today? The proper (and to some extent actual) goal of firms is widely considered to be maximizing share value, but this is manifestly not the same as maximizing shareholder value — or even benefiting shareholders. For example:

  • I hold shares in Company A, which maximizes its share value through actions that poison me or the society I live in. My shares gain value, but I suffer net harm.
  • Company A increases its value by locking its customers into a depen
... (read more)