I don't think this has much direct application to alignment, because although you can build safe AI with it, it doesn't differentially get us towards the endgame of AI that's trying to do good things and not bad things. But it's still an interesting question.
It seems like the way you're thinking about this, there's some directed relations you care about (the main one being "this is like that, but with some extra details") between concepts, and something is "real"/"applied" if it's near the edge of this network - if it doesn't have many relations directed towards even-more-applied concepts. It seems like this is the sort of thing you could only ever learn by learning about the real world first - you can't start from a blank slate and only learn "the abstract stuff", because you only know which stuff is abstract by learning about its relationships to less abstract stuff.
It seems like this is the sort of thing you could only ever learn by learning about the real world first
Yep. The idea is to try and get a system that develops all practically useful "theoretical" abstractions, including those we haven't discovered yet, without developing desires about the real world. So we train some component of it on the real-world data, then somehow filter out "real-world" stuff, leaving only a purified superhuman abstract reasoning engine.
One of the nice-to-have properties here would be is if we don't need to be able to interpret its world-model to filter out the concepts – if, in place of human understanding and judgement calls, we can blindly use some ground-truth-correct definition of what is and isn't a real-world concept.
Consider concepts such as "a vector", "a game-theoretic agent", or "a market". Intuitively, those are "purely theoretical" abstractions: they don't refer to any specific real-world system. Those abstractions would be useful even in universes very different from ours, and reasoning about them doesn't necessarily involve reasoning about our world.
Consider concepts such as "a tree", "my friend Alice", or "human governments". Intuitively, those are "real-world" abstractions. While "a tree" bundles together lots of different trees, and so doesn't refer to any specific tree, it still refers to a specific type of structure found on Earth, and shaped by Earth-in-particular's specific conditions. While tree-like structures can exist in other places in the multiverse, there's an intuitive sense that any such "tree" abstraction would "belong" to the region of the multiverse in which the corresponding trees grow.
Is there a way to formalize this, perhaps in the natural-abstraction framework? To separate the two categories, to find the True Name of "purely theoretical concepts"?
Motivation
Consider a superintelligent agent/optimization process. For it to have disastrous real-world consequences, some component of it would need to reason about the real world. It would need to track where in the world it's embedded, what input-output pathways there are, and how it can exploit these pathways in order hack out of the proverbial box/cause other undesirable consequences.
If we could remove its ability to think about "unapproved" real-world concepts, and make it model itself as not part of the world, then we'd have something plausibly controllable. We'd be able to pose it well-defined problems (in math and engineering, up to whatever level of detail we can specify without exposing it to the real world – which is plenty) and it'd spit out solutions to them, without ever even thinking about causing real-world consequences. The idea of doing this would be literally outside its hypothesis space!
There are tons of loopholes and open problems here, but I think there's promise too.
Ideas
(I encourage you to think about the topic on your own before reading my attempts.)
Take 1: Perhaps this is about "referential closure". For concepts such as "vectors" or "agents", we can easily specify the list of formal axioms that would define the frameworks within which these concepts make sense. For things like "trees", however, we would have to refer to the real world directly: to the network of causes and effects entangled with our senses.
... Except that we more or less can, nowadays, specify the mathematical axioms underlying the processes generating our universe (something something Poincaré group). To a sufficiently advanced superintelligence, there'd be no real difference.
Take 2: Perhaps the intuitions are false, and the difference is quantitative, not qualitative.
"Vectors" are concepts such that there's a simple list of axioms under which they're simple to describe/locate: they have low Kolmogorov complexity. By comparison, "trees" have a simple generator, but locating them within that generator's output (the quantum multiverse) takes very many bits.
I guess this is kind of plausible – indeed, it's probably the null hypothesis – but it doesn't feel satisfying.
Especially the pessimistic case: the "continuum" idea doesn't make sense to me. I think there's a big jump between "a human" and "an agent", and I don't see what abstractions could sit between them. (An abstraction over {humans, human governments, human corporations}, which is nevertheless more specific than "an agent in general"? Empirically, humanity hasn't been making use of this abstraction – we don't have a term for it – so it's evidently not convergently useful.)
Take 3: Causality-based definitions. Perhaps "theoretical abstractions" are convergently useful abstractions which can't be changed by any process within our universe (i. e., within the net of causes and effects entangled with our senses)? "Trees" can be wiped out or modified, "vectors" can't be.
This doesn't really work, I think. There are two approaches:
Intuitively, it feels like there's something to the "causality" angle, but I haven't been able to find a useful approach here.
Take 4: Perhaps this is about reoccurrence.
Consider the "global ontology" of convergently useful concepts defined over our universe. A concept such as "an Earthly tree" appears in it exactly once: as an abstraction over all of Earth's trees (which are abstractions over their corresponding bundles-of-atoms which have specific well-defined places, etc.). "An Earthly tree", specifically, doesn't reoccur anywhere else, at higher or lower or sideways abstraction levels.
Conversely, consider "vectors" or "markets". They never show up directly. Rather, they serve as "ingredients" in the makeup of many different "real-world" abstractions. "Markets" can model human behavior in a specific shop, or in the context of a country, and in relation to many different types of "goods" – or even the behavior of biological and even purely physical systems.
Similar for "agents" (animals, humans, corporations, governments), and even more obviously for "vectors".
Potential counterarguments:
Take 4 seems fairly promising to me, overall. Can you spot any major issues with it? Alternatively, a way to more properly flesh it out/formalize it?