Problem of Old Evidence

 Suppose a new scientific hypothesis, such as general relativity, explains a well-know observation such as the perihelion precession of mercury better than any existing theory. Intuitively, this is a point in favor of the new theory. However, the probability for the well-known observation was already at 100%. How can a previously-known statement provide new support for the hypothesis, as if we are re-updating on evidence we've already updated on long ago? This is known as the problem of old evidence, and is usually leveled as a charge against Bayesian epistemology.

Bayesian Solutions vs Scientific Method

Simplicity Priors

Proponents of simplicitysimplicity-based priors will instead say that the problem with Dr. Bad's theory can be identified by looking at its description length in contrast to Einstein's. We can tell that Einstein didn't cheat by gerrymandering his theory to specially predict Mercury's orbit correctly, because the theory is mathematically succinct! There is no room to cheat; no empty closet in which to smuggle information about Mercury. Marcus Hutter argues for this resolution to the problem of old evidence in On Universal Prediction and Bayesian Confirmation.

Logical Uncertainty

Even in cases where we can measure simplicity perfectly, such as in Solomonoff Induction, is it really a perfect correction for the problem of old evidence? This becomes implausible in cases of logical uncertainty.

Simplicity priors seem like a very plausible solution to the problem of old evidence in the case of empirical uncertainty. If I try to "cheat" by baking in some known information into my hypothesis, without having real explanatory insight, then the description length of my hypothesis will be expanded by the number of bits I sneak in. This will penalize the prior probability by exactly the amount I stand to benefit by predicting those bits! In other words, the penalty is balanced so that it does not matter whether I try to "cheat" or not.

The same argument does not hold if I am predicting mathematical facts rather than empirical facts, however. Mathematicians are often in a situation where they already know how to calculate a sequence of numbers, but they are looking for some deeper understanding, such as a closed-form expression for the sequence, or a statistical model of the sequence (EG, the prime number theorem describes the statistics of the prime numbers). It is common to compute long sequences in order to check conjectures against more of the sequence, and in doing so, treat the computed numbers as evidence for a conjecture.

If I claimed to have some way to predict the prime numbers, but it turned out that my method actually had one of the standard ways to calculate prime numbers hidden within the source code, I would be accused of "cheating" in much the same way...

Read More (363 more words)

It is typical for a Bayesian analysis to resolve the problem by pretending that all hypotheses are around "from the very beginning" so that all hypotheses are judged on all evidence. The periherionperihelion precession of mercuryMercury is very difficult to explain from Newton's theory of gravitation, and therefore quite improbable; but it fits quite well with Einstein's theory of gravitation. Therefore, Newton gets "ruled out" by the evidence, and Einstein wins.

Suppose a new scientific hypothesis, such as general relativity, explains a well-know observation such as the perihelion precession of mercury better than any existing theory. Intuitively, this is a point in favor of the new theory. However, the probability for the well-known observation was already at 100%. How can a previously-known statement provide new support for the hypothesis, as if we are re-updating on evidence we've already updated on long ago? This is known as the problem of old evidence, and is usually levelledleveled as a charge against Bayesian epistemology.

[NeedsIt is typical for a Bayesian analysis to resolve the problem by pretending that all hypotheses are around "from the very beginning" so that all hypotheses are judged on all evidence. The periherion precession of mercury is very difficult to explain from Newton's theory of gravitation, and therefore quite improbable; but it fits quite well with Einstein's theory of gravitation. Therefore, Newton gets "ruled out" by the evidence, and Einstein wins.

A drawback of this approach is that it allows scientists to formulate a hypothesis in light of the evidence, and then use that very same evidence in their favor. Imagine a physicist competing with Einstein, Dr. Bad, who publishes a "theory of gravity" which is just a list of all the observations we have made about the orbits of celestial bodies. Dr. Bad has "cheated" by providing the correct answers without any deep explanations; but "deep explanation" is not an objectively verifiable quality of a hypothesis, so it should not factor into the calculation of scientific merit, if we are to use simple update rules like Bayes' Law. Dr. Bad's theory will predict the evidence as well or better than Einstein's. So the new picture is that Newton's theory gets eliminated by the evidence, but Einstein's and Dr. Bad's theories remain as contenders.

The scientific method emphasizes predictions made in advance to avoid this type of cheating. To test Einstein's hypothesis, Sir Arthur Eddington measured Mercury's orbit in more accuracy than had been done before. This test would have ruled out Dr. Bad's theory of gravity, since (unless Dr. Bad  possessed a time machine) there would be no way for Dr. Bad to know what to predict. 

Proponents of simplicity priors will instead say that the problem with Dr. Bad's theory can be identified by looking at its description length in contrast to Einstein's. We can tell that Einstein didn't cheat by gerrymandering his theory to specially predict Mercury's orbit correctly, because the theory is mathematically succinct! There is no room to cheat; no empty closet in which to smuggle information about Mercury. Marcus Hutter argues for this resolution to the problem of old evidence in On Universal Prediction and Bayesian Confirmation.

In contrast,...

Read More (39 more words)
Applied to Einstein's Arrogance by Abram Demski ago

Suppose a new scientific hypothesis, such as general relativity, explains a well-know observation such as the perihelion precession of mercury better than any existing theory. Intuitively, this is a point in favor of the new theory. However, the probability for the well-known observation was already at 100%. How can a previously-known statement provide new support for the hypothesis, as if we are re-updating on evidence we've already updated on long ago? This is known as the problem of old evidence, and is usually levelled as a charge against Bayesian epistemology.

[Needs to be expanded!]

Applied to Radical Probabilism by Abram Demski ago
Created by Abram Demski at