Abstract: We propose a new variant of online learning that we call "ambiguous online learning". In this setting, the learner is allowed to produce multiple predicted labels. Such an "ambiguous prediction" is considered correct when at least one of the labels is correct, and none of the labels are "predictably wrong". The definition of "predictably wrong" comes from a hypothesis class in which hypotheses are also multi-valued. Thus, a prediction is "predictably wrong" if it's not allowed by the (unknown) true hypothesis. In particular, this setting is natural in the context of multivalued dynamical systems, recommendation algorithms and lossless compression. It is also strongly related to so-called "apple tasting". We show that in this setting, there is a trichotomy of mistake bounds: up to logarithmic factors, any hypothesis class has an optimal mistake bound of either , or .
This work is my first rigorous foray into compositional learning theory. Intuitively, one approach to learning compositionally is, constructing a model incrementally, adding one part each time. This raises the question of what does it mean to have a "partial" model. One notion of partial model is provided by infra-Bayesianism / robust RL. However, there the treatment of ambiguities in the model via the maximin decision rule is strongly tied to the choice of reward function. Intuitively, we might expect the process of incremental learning to not depend on the reward function that strongly.
In this work, I propose a notion of "partial" model in the online learning setting. I develop a theory of learning for this notion analogous (but more involved than) classical realizable Littlestone theory.
Does it truly lead to compositional learning? In results that don't appear in the paper, I showed that the "ambiguous Littlestone dimension" (an invariant I discovered that characterizes the complexity of ambiguous online learning) indeed behaves nicely w.r.t. some ways to decompose hypothesis classes. However, my attempts to use this framework to get computationally efficient compositional learning algorithms haven't produced much so far.
Moreover, our results on robust RL (in particular the notion of "imprecise estimator" we have there) suggests it's more natural to study a different (easier) version of online learning with ambiguity, where the learner is only allowed to predict one label instead of a set. So, the ambiguous online learning framework is a potentially useful tool, but I'm currently more optimistic about other approaches to compositional learning theory (stay tuned!).