This is a linkpost for https://arxiv.org/abs/2504.06820

Diffractor is the first author of this paper.
Official title: "Regret Bounds for Robust Online Decision Making"

Abstract: We propose a framework which generalizes "decision making with structured observations" by allowing robust (i.e. multivalued) models. In this framework, each model associates each decision with a convex set of probability distributions over outcomes. Nature can choose distributions out of this set in an arbitrary (adversarial) manner, that can be nonoblivious and depend on past history. The resulting framework offers much greater generality than classical bandits and reinforcement learning, since the realizability assumption becomes much weaker and more realistic. We then derive a theory of regret bounds for this framework. Although our lower and upper bounds are not tight, they are sufficient to fully characterize power-law learnability. We demonstrate this theory in two special cases: robust linear bandits and tabular robust online reinforcement learning. In both cases, we derive regret bounds that improve state-of-the-art (except that we do not address computational efficiency).

In our new paper, we generalize Foster et al's theory of "decision-estimation coefficients" to the "robust" (infa-Bayesian) setting. The former is the most general known theory of regret bounds for multi-armed bandits and reinforcement learning, which comes close to giving tight bounds for all "reasonable" hypothesis classes. In our work, we get an analogous theory, even though our bounds are not quite as tight.

Remarkably, the result also establishes a tight connection between infra-Bayesianism and Garrabrant induction. Specifically, the algorithm which demonstrates the upper bound works by computing beliefs in a Garrabrant-induction-like manner[1], and then acting on these beliefs via an appropriate trade-off between infra-Bayesian exploitation and exploration (defined using the "decision-estimation" approach).

It seems quite encouraging that the two different theories which came out of thinking about "logical uncertainty" (infra-Bayesianism and Garrabrant induction) can be unified in this manner[2], boosting our confidence that we are on the right path.

The paper assumes no prior familiarity with either infra-Bayesianism or Garrabrant induction.

  1. ^

    In the start of each episode.

  2. ^

    Although we think that a fuller treatment of logical uncertainty requires introducing metacognition.

New Comment
Curated and popular this week