Question/Remark 2: AFAICT, your theory has a major missing piece, which is, proving that "abstraction" (formalized according to your way of formalizing it) of is actually a crucial ingredient of learning/cognition. The way I see it, such a proof should be by demonstrating that hypothesis classes defined in terms of probabilistic graph models / abstraction hierarchies can be learned with good sample complexity (and better yet if you can tell something about the computational complexity), in a manner that cannot be achieved if you discard any of the important-according-to-you pieces. You might have some different approach to this, but I'm not sure what it is.

Reply

[-]johnswentworth3y51

Question 1: What's the minimal set of articles one should read to understand this 80%?

Telephone Theorem, Redundancy/Resampling, and Maxent for the math, Chaos for the concepts.

Question/Remark 2: AFAICT, your theory has a major missing piece, which is, proving that "abstraction" (formalized according to your way of formalizing it) of is actually a crucial ingredient of learning/cognition. The way I see it, such a proof should be by demonstrating that hypothesis classes defined in terms of probabilistic graph models / abstraction hierarchies can be learned with good sample complexity (and better yet if you can tell something about the computational complexity), in a manner that cannot be achieved if you discard any of the important-according-to-you pieces. You might have some different approach to this, but I'm not sure what it is.

If we want to show that abstraction is a crucial ingredient of learning/cognition, then "Can we efficiently learn hypothesis classes defined in terms of abstraction hierarchies, as captured by John's formalism?" is entirely the wrong question. Just because something can be learned efficiently doesn't mean it's convergent for a wide variety of cognitive systems. And even if such hypothesis classes couldn't be learned efficiently in full generality, it would still be possible for a subset of that hypothesis class to be convergent for a wide variety of cognitive systems, in which case general properties of the hypothesis class would still apply to those systems' cognition.

The question we actually want here is "Is abstraction, as captured by John's formalism, instrumentally convergent for a wide variety of cognitive systems?". And that question is indeed not yet definitively answered. The pragmascope itself would largely allow us to answer that question empirically, and I expect the ability to answer it empirically will quickly lead to proofs as well.

Reply

[-]Vanessa Kosoy3y30

Telephone Theorem, Redundancy/Resampling, and Maxent for the math, Chaos for the concepts.

Thank you!

Just because something can be learned efficiently doesn't mean it's convergent for a wide variety of cognitive systems.

I believe that the relevant cognitive systems all look like learning algorithms for a prior of certain fairly specific type. I don't know how this prior looks like, but it's something very rich on the one hand and efficiently learnable on the other hand. So, if you showed that your formalism naturally produces priors that seem closer to that "holy grail prior", in terms of richness/efficiency, compared to priors that we already know (e.g. MDPs with small number of states which are not rich enough, or the Solomonoff prior which is both statistically and computationally intractable), that would at least be evidence that you're going in the right direction.

And even if such hypothesis classes couldn't be learned efficiently in full generality, it would still be possible for a subset of that hypothesis class to be convergent for a wide variety of cognitive systems, in which case general properties of the hypothesis class would still apply to those systems' cognition.

Hmm, I'm not sure what would it mean for a subset of a hypothesis class to be "convergent".

The question we actually want here is "Is abstraction, as captured by John's formalism, instrumentally convergent for a wide variety of cognitive systems?".

That's interesting, but I'm still not sure what it means exactly. Let's say we take a reinforcement learner which a specific hypothesis class, such all MDPs of certain size, or some family of MDPs with low eluder dimension, or the actual AIXI. How would you determine whether your formalism is "instrumentally convergent" for each of those? Is there a rigorous way to state the question?

Reply

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

32

The Pragmascope Idea

32

Background: A Measurement Device Requires An Empirical Invariant

Why The Thermometer?

The Role Of The Natural Abstraction Hypothesis

What would a pragmascope look like, concretely?

What’s The Output Data Structure?