This makes sense, but LLMs can be described in multiple ways. From one perspective, as you say,
ChatGPT is not running a simulation; it's answering a question in the style that it's seen thousands - or millions - of times before.
From different perspective (as you very nearly say) ChatGPT is simulating a skewed sample of people-and-situations in which the people actually do have answers to the question.
The contents of hallucinations are hard to understand as anything but token prediction, and by a system that (seemingly?) has no introspective knowledge of it...
What you describe is a form of spiral causation (only seeded by claims of impossibility), which surely contributes strongly to what we’ve seen in the evolution of ideas. Fortunately, interest in the composite-system solution space (in particular, the open agency model) is now growing, so I think we’re past the worst of the era of unitary-agent social proof.
Nate [replying to Eric Drexler]: I expect that, if you try to split these systems into services, then you either fail to capture the heart of intelligence and your siloed AIs are irrelevant, or you wind up with enough AGI in one of your siloes that you have a whole alignment problem (hard parts and all) in there….
GTP-Nate is confusing the features of the AI services model with the argument that “Collusion among superintelligent oracles can readily be avoided”. As it says on the tin, there’s no assumption that intelligence must be limited. It is, inste...
The question then becomes, as always, what it is you plan to do with these weak AGI systems that will flip the tables strongly enough to prevent the world from being destroyed by stronger AGI systems six months later.
Yes, this is the key question, and I think there’s a clear answer, at least in outline:
What you call “weak” systems can nonetheless excel at time- and resource-bounded tasks in engineering design, strategic planning, and red-team/blue-team testing. I would recommend that we apply systems with focused capabilities along these lines to help us d...
So I think that building nanotech good enough to flip the tables - which, I think, if you do the most alignable pivotal task, involves a simpler and less fraught task than "disassemble all GPUs", which I choose not to name explicitly - is an engineering challenge where you get better survival chances (albeit still not good chances) by building one attemptedly-corrigible AGI that only thinks about nanotech and the single application of that nanotech, and is not supposed to think about AGI design, or the programmers, or other minds at all; so far as the best...
Mind space is very wide
Yes, and the space of (what I would call) intelligent systems is far wider than the space of (what I would call) minds. To speak of “superintelligences” suggests that intelligence is a thing, like a mind, rather than a property, like prediction or problem-solving capacity. This is which is why I instead speak of the broader class of systems that perform tasks “at a superintelligent level”. We have different ontologies, and I suggest that a mind-centric ontology is too narrow.
The most AGI-like systems we have today are LLMs, optimized...
So I think useful arrangements of diverse/competing/flawed systems is a hope in many contexts. It often doesn't work, so looks neglected, but not for want of trying.
What does and doesn’t work will depend greatly on capabilities, and the problem-context here assumes potentially superintelligent-level AI.
These are good points, and I agree with pretty much all of them.
Instances in a bureaucracy can be very different and play different roles or pursue different purposes. They might be defined by different prompts and behave as differently as text continuations of different prompts in GPT-3
I think that this is an important idea. Though simulators analogous to GPT-3, it may be possible to develop strong, almost-provably-non-agentic intelligent resources, then prompt them to simulate diverse, transient agents on the fly. From the perspective of building m...
I don’t see that as an argument [to narrow this a bit: not an argument relevant to what I propose]. As I noted above, Paul Christiano asks for explicit assumptions.
To quote Paul again:.
I think that concerns about collusion are relatively widespread amongst the minority of people most interested in AI control. And these concerns have in fact led to people dismissing many otherwise-promising approaches to AI control, so it is de facto an important question
Dismissing promising approaches calls for something like a theorem, not handwaving about generic “smart ...
I don't think you're going to see a formal proof, here; of course there exists some possible set of 20 superintelligences where one will defect against the others (though having that accomplish anything constructive for humanity is a whole different set of problems). It's also true that there exists some possible set of 20 superintelligences all of which implement CEV and are cooperating with each other and with humanity, and some single superintelligence that implements CEV, and a possible superintelligence that firmly believes 222+222=555 without t...
HCH-like amplification seems related, multiple unreliable agent instances assembled into a bureaucracy that as a whole improves on some of their qualities
This approach to amplification involves multiple instances, but not diverse systems, competing systems, different roles, adversarial relationships, or a concern with collusion. It is, as you say a training story for a single AI. Am I missing a stronger connection?
Yes, we need to weigh the considerations you mention, and many others.
I think copy-collusion is reasonably likely, and we don't currently know of a way to prevent it.
Copy-collusion must be considered carefully, and its implications will depend on both potential contexts for using copies (sometimes problematic, sometimes OK) and on potential alternatives to using copies.
Copies will always “collude” in the trivial sense of giving identical, potentially deceptive responses to identical inputs, but might also offer potentially deceptive, mutually-supporting re...
From the Afterword: “Note that I have presented a there-exists argument against a for-all proposition. Responses of the form “But I can think of something similar that won’t work” don’t count as counterarguments.” [This has been edited in the main text to sharpen the phrasing.]
B2: Copies might collude with one another, provided they have a broad decision theory: ∃(some sets of models) among which some mechanisms for impeding collusion might not work.
B4: A lot of models may be deployed over the internet: ∃(some communication channels) for which one of 6 mec...
I’d like to promote a norm for proposals for alignment techniques to be very explicit about where the hard work is done, i.e. which part is surprising or insightful or novel enough to make us think that it could solve alignment even in worlds where that’s quite difficult.
Alignment is, by nature, an engineering task, not a scientific task: It is an attempt to make something, not to understand some existing thing. It may be that, as you suggest, “solving hard scientific problems usually requires compelling insights”, but this is beside the point. Spacef...
Although I don’t understand what you mean by “conservation of computation”, the distribution of computation, information sources, learning, and representation capacity is important in shaping how and where knowledge is represented.
The idea that general AI capabilities can best be implemented or modeled as “an agent” (an “it” that uses “the search algorithm”) is, I think, both traditional and misguided. A host of tasks require agentic action-in-the-world, but those tasks are diverse and will be performed and learned in parallel (see the CAIS report, www.fhi...
How exactly does the terminal goal of benefiting shareholders disappear[…]
But does this terminal goal exist today? The proper (and to some extent actual) goal of firms is widely considered to be maximizing share value, but this is manifestly not the same as maximizing shareholder value — or even benefiting shareholders. For example:
I agree that using the forms of human language does not ensure interpretability by humans, and I also see strong advantages to communication modalities that would discard words in favor of more expressive embeddings. It is reasonable to expect that systems with strong learning capacity could to interprete and explain messages between other systems, whether those messages are encoded in words or in vectors. However, although this kind of interpretability seems worth pursuing, it seems unwise to rely on it.
The open-agency perspective suggests that while inte... (read more)