Agreed that this (or something near it) appears to be a relatively central difference between people's models, and probably at the root of a lot of our disagreement. I think this disagreement is quite old; you can see bits of it crop up in Hanson's posts on the "AI foom" concept way back when. I would put myself in the camp of "there is no such binary intelligence property left for us to unlock". What would you expect to observe, if a binary/sharp threshold of generality did not exist?
A possibly-relevant consideration in the analogy to computation is that ...
Consider a training environment that's complex/diverse enough to make it impossible to fit a suite of heuristics meeting all its needs into an agent's (very bounded) memory. The agent would need to derive new heuristics on the fly, at runtime, in order to deal with basically-OOD situations it frequently encounters, and to be able to move freely in the environment, instead of being confined to some subset of that environment.
In other words, the agent would need to be autonomous.
Agreed. Generally, whenever I talk about the agent being smart/competent, I am a...
... By figuring out what R is and deciding to act as an R -pursuing wrapper-mind, therefore essentially becoming an R -pursuing wrapper-mind. With the only differences being that it 1) self-modified into one at runtime, instead of being like this from the start, and 2) it'd decide to "stop pretending" in some hypothetical set of situations/OOD, but that set will shrink the more diverse our training environment is (the fewer OOD situations there are). No?
It is not essentially a -pursuing wrapper-mind. It is essentially an X-pursuing wrapper-mind that ...
Yeah I disagree pretty strongly with this, though I am also somewhat confused what the points under contention are.
I think that there are two questions that are separated in my mind but not in this post:
Broadly on board with many of your points.
We need to apply extremely strong selection to get the kind of agent we want, and the agent we want will itself need to be making decisions that are extremely optimized in order to achieve powerfully good outcomes. The question is about in what way that decision-making algorithm should be structured, not whether it should be optimized/optimizing at all. As a fairly close analogy, IMO a point in the Death With Dignity post was something like "for most people, the actually consequentialist-correct choice is NOT to tr...
Certainly possible. Though we seem to be continually marching down the list of tasks we once thought "can only be done with systems that are really general/agentic/intelligent" (think: spatial planning, playing games, proving theorems, understanding language, competitive programming...) and finding that, nope, actually we can engineer systems that have the distilled essence of that capability.
That makes a deflationary account of cognition, where we never see the promised reduction into "one big insight", but rather chunks of the AI field continue to break ...
It's quite hard to find system with short-term terminal goals, not short-term planning horizon due to computational limits. To put in another words, taskiness is an unsolved problem in AI alignment. We don't know how to tell superintelligent AGI "do this, don't do everything else, especially please don't disassemble everyone in process of doing this, stop after you've done this".
I dunno. The current state of traditional and neural AI look very much like "we only know how to build tasky systems", not like "we don't know how to build tasky systems". They ...
Object-level comments below.
Clearing up some likely misunderstandings:
Assumption 1. A sufficiently advanced agent will do at least human-level hypothesis generation regarding the dynamics of the unknown environment.
I am fairly confident that this is not the part TurnTrout/Quintin were disagreeing with you on. Such an agent plausibly will be doing at least human-level hypothesis generation. The question is on what goals will be driving the agent. A monk may be able to generate the hypothesis that narcotics would feel intensely rewarding, more rewarding than...
I really wish this post took a different rhetorical tack. Claims like, for example, the one that the reader should engage with your argument because "it has been certified as valid by professional computer scientists" do the post a real disservice. And they definitely made me disinclined to continue reading.
Note: "ask them for the faciest possible thing" seems confused.
How I would've interpreted this if I were talking with another ML researcher is "Sample the face at the point of highest probability density in the generative model's latent space". For GANs and diffusion models (the models we in fact generate faces with), you can do exactly this by setting the Gaussian latents to zeros, and you will see that the result is a perfectly normal, non-Eldritch human face.
I'm guessing what he has in mind is more like "take a GAN discriminator / image classifier &...
I took Nate to be saying that we'd compute the image with highest faceness according to the discriminator, not the generator. The generator would tend to create "thing that is a face that has the highest probability of occurring in the environment", while the discriminator, whose job is to determine whether or not something is actually a face, has a much better claim to be the thing that judges faceness. I predict that this would look at least as weird and nonhuman as those deep dream images if not more so, though I haven't actually tried it. I also predic...
It's the relevant operationalization because in the context of an AI system optimizing for X-ness of states S, the thing that matters is not what the max-likelihood sample of some prior distribution over S is, but rather what the maximum X-ness sample looks like. In other words, if you're trying to write a really good essay, you don't care what the highest likelihood essay from the distribution of human essays looks like, you care about what the essay that maxes out your essay-quality function is.
(also, the maximum likelihood essay looks like a single word, or if you normalize for length, the same word repeated over and over again up to the context length)
Not the OP but this jumped out at me:
If the labels are not perfect, then the major failure mode is that the AI ends up learning the actual labelling process rather than the intended natural abstraction. Once the AI has an internal representation of the actual labelling process, that proto-shard will be reinforced more than the proto-diamond shard, because it will match the label in cases where the diamond-concept doesn't (and the reverse will not happen, or at least will happen less often and only due to random noise).
This failure mode seems plausible ...
At the risk of reading too much into wording, I think the phrasing of the above two comments contains an interesting difference.
The first comment (TurnTrout) talks about reward as the thing providing updates to the agent's cognition, i.e. "reward schedules produce ... cognitive updates", and expresses confusion about a prior quote that mentioned implementing our wishes through reward functions.
The second comment (paulfchristiano) talks about picking "rewards that would implement human wishes" and strategies for doing so.
These seem quite different. If I try...
If you have checkpoints from different points in training of the same models, you could do a comparison between different-size models at the same loss value (performance). That way, you're actually measuring the effect of scale alone, rather than scale confounded by performance.