October 29, 2006

Why False Positives Mean So Much

Why do some people believe that Floyd Landis is innocent, while others believe that he is guilty? Why do the same people reach the opposite conclusions about Barry Bonds, or Marion Jones?

Let's talk about Bayesian Inference. Bayesian Inference is a mathematical approach that can be used to estimate how evidence influences belief in a hypothesis.

The basis of Bayesian Inference is an equation known as Bayes' Rule, which is widely applied in the study of probability. Bayesian Inference is the somewhat controversial application of Bayes' Rule to discussions of belief. In Bayesian Inference a person's degree of belief in a hypothesis is treated mathematically as if it was an estimate of the probability of the hypothesis.

Rather than go through the general background, I'll leave you to the Wikipedia article and get right to our specific example. In this case, the hypothesis in question (which I'll call H) is that an Athlete is guilty of doping. The hypothesis is either true or false, but the truth is usually not known with one hundred percent confidence. The evidence in the case (E) will be the results of anti-doping tests. The quantity that I want to calculate is a person's degree of belief in H in the face of the collected evidence E.

To do this we need to know three things. First, we need to quantify the person's prior belief in the athlete doping hypothesis. That is, what probability would you assign to the hypothesis before the presentation of the anti-doping test results? I'll denote the prior belief by P(H).

The other two quantities we need to know concern the conditional probabilities of the evidence. We need to know the probability that the evidence would arise if the athlete was guilty, which we might call the probability of detection (PD). We also need to know the probability that the evidence would arise if the athlete was innocent, which we might call the probability of false positive (PFP).

The posterior belief in H — in other words the probability of H being true after the evidence has been taken into account — is written as P(H | E), and can be calculated as:

   P(H | E) = PD × P(H) / [PD × P(H) + PFP × (1 – P(H))]

The following calculator allows you to enter these three inputs as percentages, and calculates your revised belief in the doping athlete hypothesis.

The Doping Athlete Bayesian Inference Calculator
P(H): Prior belief, before you saw evidence E, that Athlete was doping: %
PD: Probability that evidence E would be found against a doping athlete: %
PFP: Probability that evidence E would be found against an athlete who is not doping: %
Revised Belief that Athlete is Doping: %

Let's look at a hypothetical example. Let's say there's a positive test in a sport where you have reason to believe that 10% of athletes are doping. And let's say that the test that the athlete has failed is known to be 60% effective in detecting cheaters. And finally, let's say that the probability of a false positive is 0.1%. If we put these three numbers into the calculator we find that our revised belief in the athlete's guilt has risen to 98.5%. That's probably more than enough to meet the burden of proof in an anti-doping case.

In fact, as long as the probability of false positives is low, the other parameters really don't make much difference to the outcome. For example, consider the case where PFP = 0.01%, equivalent to one false positive test result out of every 10,000 tests. Then even if the prior belief of guilt is only 1%, and the probability of detection is only 20%, the posterior belief of guilt is greater than 95%. If we take this to the extreme case where there is zero probability of a false positive, then the posterior belief of guilt will always be 100%!

You can try some examples for yourself, and you will see that the weight of the evidence in the case depends very strongly on the probability of false positives. This number must be kept small if we want to treat positive tests as conclusive.

Now, what happens if you really would rather not modify your belief based on the new evidence? Is there any way, in the face of a positive anti-doping test, that you can go on believing in your favourite athlete's innocence?

Bayesian inference does allow for this possibility, if the evidence is poor enough; and in fact the mathematical treatment defines for us what "poor enough" means. If the test's probability of false positive rises, or the probability of detection sinks, to the point where both are equal, then the posterior belief in the athlete doping hypothesis will be exactly equal to the prior belief.

In most doping cases, the public receives very little hard information about the reliability of the evidence; neither PFP nor PD are well-known. In the Landis case, for example, anti-doping authorities were quick to state that the chances of a false positive on the IRMS test are very small. Many of Floyd's fans still doubt that claim.

If anybody out there tries this calculator for the Landis case or any other, let me know in the comments. I'd like to hear your assessments of your prior belief and the conditional probabilities, and to see if the Bayesian inference of posterior belief at all matches with your actual belief.

No comments: