Sunday 18 September 2011

Bayes' theorem - examples

The Cancer Test Problem

Remember Bayes' theorem:


P(A|B) = P(B|A) * P(A)/P(B)


The following problem is a very famous case of how to use the theorem.

We have a cancer-detecting test which gives which gives a positive result for 90% of people who do have the cancer, but also gives a positive result for 10% of people who don’t actually have the cancer. A patient comes in and gets a positive result. How worried should they be? (In other words, what is the chance that they do actually have cancer?)

Well in fact, Bayes' theorem tells us that we don’t actually have all the information necessary to answer this question.

Indeed, let’s set:
Event A = person has cancer
Event B = test is positive

We are looking for P(A|B). We know P(B|A) (it’s 90%) and P(B|nonA) (it’s 10%). Knowing P(B|nonA) is actually as much information as knowing P(B), because we can calculate P(B) from this and P(A) using the total probability law – we’ll see this later. So we still need to know P(A).

Suppose the question now is: 2% of people have this cancer. We have a cancer-detecting test which gives which gives a positive result for 90% of people who do have the cancer, but also gives a positive result for 10% of people who don’t actually have the cancer. A patient comes in and gets a positive result. How worried should they be? (In other words, what is the chance that they do actually have cancer?)

A lot of doctors were asked this question. Only 15% of them got it right (this article cites a few studies for this result – it’s also a very interesting and entertaining read). They generally estimated the chance that the person did indeed have the cancer to be very high, close to 90%.

But what is the correct number?

A more intuitive way of thinking about this problem is the following:

Take a pool of 1,000 people.
  
20 of them have cancer
    18 of these will have a positive reading on the test
980 of them do not have cancer.
    98 of these will have a positive reading on the test

So in total, 116 people will have a positive reading on the test, and 18 of these will actually have cancer. So the probability that a person with a positive reading does actually have cancer is 18/116 = 15.5% which is still relatively low. So it would be a far better approach to understand the maths involved in this, and not freak your patient out without good reason.

Let’s now calculate the probability using Bayes theorem – hopefully, we will get the same result.

Our first step is to calculate P(B). We do this using the law of total probability:

P(B) = P(B|A)P(A) + P(B|nonA)P(nonA)
        = 0.9*0.02 + 0.1*0.98
        = 0.116

Now Bayes’theorem: P(A|B) = P(B|A)P(A)/P(B) = 0.9*0.02/0.116 = 15.5% (yes!!)

This example question is detailed in this video, which also presents some other very interesting counterintive statistical issues. The part we’re interested in is at 11 minutes, but the entire thing is worth a watch.


The Prosecutor's Fallacy

Bayes’ theorem often applies when considering the probability that a person is guilty of a crime. Indeed, misunderstanding it leads to what is called the prosecutor’s fallacy – when you interpret the small probability of someone fitting the evidence as a small probability that an accused who does fit the evidence is in fact innocent.

Let’s consider the following case:

In a murder case you have found a sample of the murderer’s DNA, and there is a 0.1% chance of a random someone’s DNA matching this sample. You have found a man whose DNA does match.

Then the correct interpretation is NOT there is 0.1% chance that this man is not the murderer, ie there is 99.9% chance that he is the murderer.

Bayes’ theorem tells us that in order to calculate this last probability – the probability that the man is guilty, given that he matches the DNA, one also needs to take into account the probability of a random person being a murderer, which is extremely low, say it is 0.01%.

Let’s use the following notation:

Event A = The man is guilty
Event B = The man’s DNA matches the one found

Then we have
P(B) = 0.1%
P(A) = 0.01%

The probability we are interested in is P(A|B): the probability that the man is guilty, given that his DNA matches the killer’s.

Bayes’ theorem gives us that P(A|B) = P(B|A)*P(A)/P(B)

Now P(B|A) is the probability that the man’s DNA would match the killer’s, if he is indeed the killer. Which should be pretty close to 1, if your DNA testing is any good! P(A) and P(B) are given above, so in the end:

P(A|B) = 10%

Most definitely not a cause for putting someone in prison!

Obviously, in actual cases, this is not the only thing to take into account. If the man’s DNA matches the killer, and he also matches a description of the killer, and has no alibi, and some shoes were found in his house covered in blood, the odds would change somewhat.

Wednesday 7 September 2011

Bayes' Theorem - Introduction

Lawyers often talk about Bayesian analysis. It is, in fact, one of the major ways in which maths play a role in law: it is used for estimating the probative value of certain pieces of evidence in the presence of other evidence.

We will want to refer to Bayes' theorem quite often, so this post is devoted to a simple and complete explanation.

First of all, let’s introduce some notation. We write:


P(A) = probability of A = probability that the event A happens

P(A|B) = probability of A given B = probability that the event A happens, given that the event B has happened.

Bayes’ theorem relates P(A|B) to P(B|A). And it turns out that the relation involves what is called the prior probabilities – the probabilities P(A) (probability that A happens, without considering B at all) and P(B) (probability that B happens, without considering A at all).

The exact relationship is the following:

P(A|B) = P(B|A) * P(A) / P(B)

Now, this may not mean anything to you right now, but it is actually very counter-intuitive. People generally feel like they have a good idea of what P(B|A) is if they know P(A|B) and just one of the prior probabilities, like P(B). But depending on what the prior probability P(A) is, that good idea may in fact be completely off.

For example, imagine you are a teacher, and you give your students a very difficult test, which every year only 5% of your students get an A on. Obviously a good (or bad) way for a student of increasing his chances is cheating – say that cheating gives him a 40% chance of getting an A. Now, imagine that you have a student who has got an A. As the teacher, you might be tempted to think that the student has cheated. Well, Bayes’ theorem tells you that you cannot know how likely it is that this is the case if you don’t consider the prior probability of cheating.

First, let’s figure out the information we do have:

Call A the event getting an A on the test, and B the event that the student has cheated.
Then we know P(A|B) = 0.4 and P(A) = 0.05.

We want to know P(B|A), the probability that the student has cheated, given that he got an A.

Since we need to know the probability of cheating, P(B), let's consider two different situations.

First situation:
suppose this was a fairly relaxed test setting, where students were not supervised very carefully, and you estimate that 10% of them cheated, ie P(B) = 0.1.

Then Bayes’ theorem tells us that P(B|A) = P(A|B)*P(B)/P(A) = 0.4*0.1/0.05 = 0.8. In other words, if the student got an A, then there's an 80% chance that he did it by cheating.

Second situation: suppose this was a very controlled exam, where students were in individual rooms with individual supervisors. The chance of cheating is still not 0, but it’s definitely a lot smaller, say just 1%, or 0.01. Then Bayes’ theorem tells us that P(B|A) = 0.4*0.01/0.05 = 0.08, or 8%. So if you have a student who got an A, it's not that likely that he cheated, and you need not be too concerned.

Of course, this is a rather simple problem – it is a lot more probable that the student cheated if he was in an environment where it was easy to cheat, than if he wasn’t – not exactly surprising. But the problems can get a lot more confusing than that, as we will see next time!

Saturday 3 September 2011

Mathematical Model for Jury Decisions


Kaplan and Cullison: Jury Decision Model

The concept of “reasonable doubt” is one of the most difficult to quantify In the whole of legal theory. Suppose you’re on a jury and you’ve seen all the evidence, and you’ve come to some conclusion about the probability of the defendant: 75%, say, or 90%. Should you vote for a conviction? 
 
In this post we are going to discuss a mathematical model proposed by John Kaplan and Alan Cullison, which is supposed to help jurors, who have already assessed the probability of guilt of the defendant, to decide what verdict to return. The novelty and importance of their idea is that in order to reach a decision, the juror must make some kind of measurement of his own personal degree of repugnance at the idea of acquitting a guilty person, a degree which can obviously vary greatly according to the crime being judged, and the danger of its being repeated if the culprit is acquitted.
 
Our explanation of the model comes from a seminal 1971 article by Laurence Tribe:Trial by Mathematics, Precision and Ritual in the Legal Process (84 Harvard Law Rev. 1329, 1971). Tribe’s rebuttal of the model, and his discussion of the use of mathematics in trials in general, is complex and fascinating, and will be the subject of a series of future posts. 

Kaplan and Cullison’s model for jury decision-making.

Let's say that the trial is over, all the evidence has been seen, and the trier (i.e. the jury member) assesses the probability of guilt of the accused as some probability value P between 0 and 1.
Now the trier must decide on a verdict by choosing between two acts: convict or acquit.

There are four possibilities for the outcome of a trial:
C_G (conviction of a guilty person)
C_I (conviction of an innocent person)
A_G (acquittal of a guilty person)
A_I (acquittal of an innocent person).
The trier will assign numerical values between 0 and 1 to each of these possibilities. He begins by taking C_G=1 (most desirable) and C_I=0 (least desirable).
Next, the trier must assign values to A_G and A_I according to the following procedure. Start with A_G.

The trier asks himself the following question: "Would I rather have a result that I know to be A_G, or take a 1/2 - 1/2 chance between C_G and C_I?”


If he realizes that he would prefer A_G, then he knows that the value of A_G will be greater than 1/2. Now he will try to see if it's greater than 3/4 by asking himself: "Would I rather have a result that I know to be A_G, or take a 3/4 - 1/4 chance between C_G and C_I?”
And so on, until he closes in on an actual value for A_G. He then uses the same procedure for A_I.

There are many things to keep in mind when making the decisions of what one would prefer. The choice of a value for A_G is particularly delicate, because if the outcome of the trial is the acquittal of the culprit, then the jury may feel some responsibility if, for example, the crime was a type of brutal murder which then occurs again after the criminal's acquittal. In such a situation, the trier may well feel that A_G is not preferable to a 50-50 chance between C_G and C_I. The value for A_G is never likely to be less than 1/2, since few people if any would take a chance between C_G and C_I if the likelihood of C_I is actually perceived as greater than C_G. But 0.5 may be an acceptable value, or, if the crime is not likely to re-occur, A_G may end up being a high value such as 0.9, in order to minimize the chance of convicting an innocent.


The same procedure is used to determine a value for A_I, but the meaning is different. On the whole, A_I is a better outcome than A_G, because at least the trier has not erred in his work. Therefore the trier is unlikely to take much of a risk of convicting an innocent, in comparison to acquitting an innocent, and the value of A_I will tend to be significantly higher than 0.5. On the other hand, the trier may deeply dislike the outcome A_I because if the real culprit is in fact free, then there is a danger that he will continue his crimes, so he may not want to simply set A_I=1; he may prefer to have a good chance of having convicted the guilty party than to be certain of having acquitted someone innocent.
Still, on the whole, A_I is likely to be quite high, higher than A_G.

Now that the numbers P, A_G and A_I have been fixed subjectively by the trier, the Kaplan-Cullison model suggests the following calculation to decide between conviction and acquittal.
If the trier chooses to convict, he will get C_G with a probability of P, and C_I with a probability of 1-P. Defined the "expected utility" UC of the choice “convict” by the standard weighted formula

UC = P C_G + (1-P) C_I, which in fact is always just equal to UC = P.


Similarly, if the trier chooses to acquit, there's a probability of P that he'll actually get A_G and (1-P) that he'll actually get A_I, so we can defined the "expected utility" UA of the choice A by


UA = P A_G + (1-P) A_I.


Both UC and UA are numbers, and the model says all we have to do is compare them.


If UC > UA, then choose to convict. If UA > UC, then choose to acquit.


Examples
: Suppose the trier is only 75% convinced of the guilt of the accused. The trier has done all of the above calculations and has fixed A_G at 0.5 and A_I at 0.9.

Then according to the formulas, UC = .75 and UA = .75 x .5 + .25 x .9 = .375+.225=.6.


Thus UC > UA so in this situation, the trier should vote to convict.


This result may seem really bizarre in view of the injunction to vote for a conviction only if convinced of guilt "beyond a reasonable doubt". Clearly a 75% probability of guilt is not beyond a reasonable doubt. Yet the Kaplan-Cullison model leads to a recommendation to convict.

On the other hand, the choices for A_G and A_I above are not necessarily the most normal choices. There are many other possible attitudes. In the case of an unimportant crime, the trier may set A_G=1 and A_I=1, meaning that he would rather acquit the accused, whether innocent or guilty, than accept even the smallest chance of convicting an innocent person. If so, then he will find that UA=1, meaning no matter what the value P of his conviction that the accused is guilty, even if P=99%, he should vote to acquit.

In essence, what the model is suggesting is that the concept of "beyond a reasonable doubt" be replaced by the concept of "utility", with the trier taking into account the negative aspects of acquitting a guilty person or convicting an innocent.
This marks a deep rift with respect to the tradition underlying the way trials are conducted and jury decisions are made. It is an important question and one well worth considering.

In a series of upcoming posts, I will introduce the work of Laurence Tribe, and in particular explain his reasons for rejecting this model.
These reasons are deep and fascinating, and go far beyond the realm of mathematics or even legal theory, into the domain of psychology.