Director of AI research at ALTER, where I lead a group working on the learning-theoretic agenda for AI alignment. I'm also supported by the LTFF. See also LinkedIn.
E-mail: {first name}@alter.org.il
This remains the best overview of the learning-theoretic agenda to-date. As a complementary pedagogic resource, there is now also a series of video lectures.
Since the article was written, there were several new publications:
In addition, some new developments were briefly summarized in short-forms:
Meanwhile, active research proceeds along several parallel directions:
It remains true that there are more shovel-ready open problems than researchers, and hence the number of (competent) researchers is still the bottleneck.
Seems right, but is there a categorical derivation of the Wentworth-Lorell rules? Maybe they can be represented as theorems of the form: given an arbitrary Markov category C, such-and-such identities between string diagrams in C imply (more) identities between string diagrams in C.
This article studies a potentially very important question: is improving connectomics technology net harmful or net beneficial from the perspective of existential risk from AI? The author argues that it is net beneficial. Connectomics seems like it would help with understanding the brain's reward/motivation system, but not so much with understanding the brain's learning algorithms. Hence it arguably helps more with AI alignment than AI capability. Moreover, it might also lead to accelerating whole brain emulation (WBE) which is also helpful.
The author mentions 3 reasons why WBE is helpful:
I think there is another reason: some alignment protocols might rely on letting the AI study a WBEs and use it for e.g. inferring human values. The latter might be viable even if actually running the WBE too slow to be useful with contemporary technology.
I think that performing this kind of differential benefit analysis for various technologies might be extremely important, and I would be glad to see more of it on LW/AF (or anywhere).
This article studies a natural and interesting mathematical question: which algebraic relations hold between Bayes nets? In other words, if a collection of random variables is consistent with several Bayes nets, what other Bayes nets does it also have to be consistent with? The question is studied both for exact consistency and for approximate consistency: in the latter case, the joint distribution is KL-close to a distribution that's consistent with the net. The article proves several rules of this type, some of them quite non-obvious. The rules have concrete applications in the authors' research agenda.
Some further questions that I think would be interesting to study:
Tbf, you can fit a quadratic polynomial to any 3 points. But triangular numbers are certainly an aesthetically pleasing choice. (Maybe call it "triangular voting"?)
I feel that this post would benefit from having the math spelled out. How is inserting a trader a way to do feedback? Can you phrase classical RL like this?
Two thoughts about the role of quining in IBP:
I believe that all or most of the claims here are true, but I haven't written all the proofs in detail, so take it with a grain of salt.
Ambidistributions are a mathematical object that simultaneously generalizes infradistributions and ultradistributions. It is useful to represent how much power an agent has over a particular system: which degrees of freedom it can control, which degrees of freedom obey a known probability distribution and which are completely unpredictable.
Definition 1: Let be a compact Polish space. A (crisp) ambidistribution on is a function s.t.
Conditions 1+3 imply that is 1-Lipschitz. We could introduce non-crisp ambidistributions by dropping conditions 2 and/or 3 (and e.g. requiring 1-Lipschitz instead), but we will stick to crisp ambidistributions in this post.
The space of all ambidistributions on will be denoted .[1] Obviously, (where stands for (crisp) infradistributions), and likewise for ultradistributions.
Example 1: Consider compact Polish spaces and a continuous mapping . We can then define by
That is, is the value of the zero-sum two-player game with strategy spaces and and utility function .
Notice that in Example 1 can be regarded as a Cartesian frame: this seems like a natural connection to explore further.
Example 2: Let and be finite sets representing actions and observations respectively, and be an infra-Bayesian law. Then, we can define by
In fact, this is a faithful representation: can be recovered from .
Example 3: Consider an infra-MDP with finite state set , initial state and transition infrakernel . We can then define the "ambikernel" by
Thus, every infra-MDP induces an "ambichain". Moreover:
Claim 1: is a monad. In particular, ambikernels can be composed.
This allows us defining
This object is the infra-Bayesian analogue of the convex polytope of accessible state occupancy measures in an MDP.
Claim 2: The following limit always exists:
Definition 3: Let be a convex space and . We say that occludes when for any , we have
Here, stands for convex hull.
We denote this relation . The reason we call this "occlusion" is apparent for the case.
Here are some properties of occlusion:
Notice that occlusion has similar algebraic properties to logical entailment, if we think of as " is a weaker proposition than ".
Definition 4: Let be a compact Polish space. A cramble set[2] over is s.t.
Question: If instead of condition 3, we only consider binary occlusion (i.e. require , do we get the same concept?
Given a cramble set , its Legendre-Fenchel dual ambidistribution is
Claim 3: Legendre-Fenchel duality is a bijection between cramble sets and ambidistributions.
The space is equipped with the obvious partial order: when for all . This makes into a distributive lattice, with
This is in contrast to which is a non-distributive lattice.
The bottom and top elements are given by
Ambidistributions are closed under pointwise suprema and infima, and hence is complete and satisfies both infinite distributive laws, making it a complete Heyting and co-Heyting algebra.
is also a De Morgan algebra with the involution
For , is not a Boolean algebra: and for any we have .
One application of this partial order is formalizing the "no traps" condition for infra-MDP:
Definition 2: A finite infra-MDP is quasicommunicating when for any
Claim 4: The set of quasicommunicating finite infra-MDP (or even infra-RDP) is learnable.
Going to the cramble set representation, iff .
is just , whereas is the "occlusion hall" of and .
The bottom and the top cramble sets are
Here, is the top element of (corresponding to the credal set .
The De Morgan involution is
Definition 5: Given compact Polish spaces and a continuous mapping , we define the pushforward by
When is surjective, there are both a left adjoint and a right adjoint to , yielding two pullback operators :
Given and we can define the semidirect product by
There are probably more natural products, but I'll stop here for now.
Definition 6: The polytopic ambidistributions are the (incomplete) sublattice of generated by .
Some conjectures about this:
Here's the sketch of an AIT toy model theorem that in complex environments without traps, applying selection pressure reliably produces learning agents. I view it as an example of Wentworth's "selection theorem" concept.
Consider any environment of infinite Kolmogorov complexity (i.e. uncomputable). Fix a computable reward function
Suppose that there exists a policy of finite Kolmogorov complexity (i.e. computable) that's optimal for in the slow discount limit. That is,
Then, cannot be the only environment with this property. Otherwise, this property could be used to define using a finite number of bits, which is impossible[1]. Since requires infinitely many more bits to specify than and , there has to be infinitely many environments with the same property[2]. Therefore, is a reinforcement learning algorithm for some infinite class of hypothesis.
Moreover, there are natural examples of as above. For instance, let's construct as an infinite sequence of finite communicating infra-RDP refinements that converges to an unambiguous (i.e. "not infra") environment. Since each refinement involves some arbitrary choice, "most" such have infinite Kolmogorov complexity. In this case, exists: it can be any learning algorithm for finite communicating infra-RDP with arbitrary number of states.
Besides making this a rigorous theorem, there are many additional questions for further investigation:
Probably, making this argument rigorous requires replacing the limit with a particular regret bound. I ignore this for the sake of simplifying the core idea.
There probably is something more precise that can be said about how "large" this family of environment is. For example, maybe it must be uncountable.
This post tries to push back against the role of expected utility theory in AI safety by arguing against various ways to derive expected utility axiomatically. I heard many such arguments before, and IMO they are never especially useful. This post is no exception.
The OP presents the position it argues against as follows (in my paraphrasing): "Sufficiently advanced agents don't play dominated strategies, therefore, because of [theorem], they have to be expected utility maximizers, therefore they have to be goal-directed and [other conclusions]". They then proceed to argue that there is no theorem that can make this argument go through.
I think that this entire framing is attacking a weak man. The real argument for expected utility theory is:
In conclusion, there are coherence theorems. But, more important than individual theorems are the "coherence theories".