Do you generally think that people in the AI safety community should write publicly about what they think is "the missing AGI ingredient"?
It's remarkable that this post was well received on the AI Alignment Forum (18 karma points before my strong downvote).
Retain or forget information and skills over long time scales, in a way that serves its goals. E.g. if it does forget some things, these should be things that are unusually unlikely to come in handy later.
If memory is cheap, designing it to just remember everything may be a good idea. And there may be some architectural reason why choosing to forget things is hard.
Epistemic status: trying to feel out the shape of a concept and give it an appropriate name. Trying to make explicit some things that I think exist implicitly in many people's minds. This post makes truth claims, but its main goal is to not to convince you that they are true.
Here are some things I would expect any AGI to be able to do:
I'm not sure how related these properties are, though they feel like a cluster in my mind. In any case, a unifying theme of this list is that current ML models generally do not do these things -- and we do not ask them to do these things.
We don't train models in a way that encourages these properties, and in some cases we design models whose structures rule them out. Benchmarks for these properties are either nonexistent, or much less mature than more familiar benchmarks.
Is there an existing name for this cluster? If there isn't one, I propose the name "autonomy." This may not be an ideal name, but it's what I came up with.
I think this topic is worthy of more explicit discussion than it receives. In debates about the capabilities of modern ML, I usually see autonomy brought up in a tangential way, if at all.
ML detractors sometimes cite the lack of autonomy in current models as a flaw, but they rarely talk about the fact that ML models are not directly trained to do any of this stuff, and indeed often deployed in a manner that renders this stuff impossible. (The lack of autonomy in modern ML is more like a category mistake than a flaw.)
ML enthusiasts sometimes refer to autonomy dismissively, as something that will be solved incidentally by scaling up current models -- which seems like a very inefficient approach, compared to training for these properties directly, which is largely untried.
Alternatively, ML enthusiasts may cite the very impressive strides made by recent generative models as evidence of generally fast "ML progress," and then gesture in the direction of RL when autonomy is brought up.
However, current generative models are much closer to human-level perception and synthesis (of text, pictures, etc) than current RL models are to human-level autonomy.
State-of-the-art RL can achieve some of the properties above in toy worlds like video games; it can also perform at human level at some tasks orthogonal to autonomy, as when the (frozen) deployed AlphaZero plays a board game. Meanwhile, generative models are producing illustrative art at a professional level of technical skill across numerous styles -- to pick one example. There's a vast gap here.
Also, as touched upon below, even RL is usually framed in a way that rules out, or does not explicitly encourage, some parts of autonomy.
If not autonomy, what is the thing that current ML excels at? You might call it something like "modeling static distributions":
This is a very different setting than the one an AGI would operate in. Asking whether a model of this kind displays autonomy doesn't really make sense. At most, we can wonder whether it has an inner optimizer with autonomy. But that is an inefficient (and uncontrollable!) way to develop an ML model with autonomy.
What's noteworthy to me is that we've done extremely well in this setting, at the goals laid out by this setting. Meanwhile, we have not really tried to define a setting that allows for, and encourages, autonomy. (I'm not sure what that would entail, but I know existing setups don't do it.)
Even RL is usually not set up to allow and encourage autonomy, though it is closer than the generative or classification settings. There is still a distinction between "training" and "inference," though we may care about learning speed in training in a way we don't in other contexts. We generally only let the model learn long enough to reach its performance peak; we generally don't ask the model to learn things in a temporal sequence, one after the other, while retaining the earlier ones -- and certainly not while retaining the earlier ones insofar as this serves some longer-term goal. (The model is not encouraged to have any long-term goals.)
I realize this doesn't describe the entirety of RL. But a version of the RL field that was focused on autonomy would look very different.
The above could affect AGI timelines in multiple ways, with divergent effects.