Thursday 26 January 2012

Laurence Tribe: Maths on Trial (5)

Tribe’s summary of Bayes’ Theorem in law


We’ve been describing the contents of Laurence Tribe’s seminal article on mathematics at trial. The first two parts we discussed were:


Firstly, a description of the kind of use of mathematics at trial that he is specifically going to discuss, with examples;


Secondly, a review of the traditional arguments that judges have used against mathematics at trial, together with Tribe’s reaction to these arguments;


Today we’re going to summarize the next part:


Thirdly, a sketch of the introduction of Bayes’ theorem at trial as introduced by Finkelstein and Fairley.


Tribe starts by giving an informal introduction to the relevance of Bayes’ theorem: “In deciding a disputed proposition, a rational factfinder probably begins with some initial, a priori estimate of the likelihood of the proposition’s truth, then updates his prior estimate in light of discoverable evidence bearing on that proposition, and arrives finally at a modified assessment of the proposition’s likely truth in light of whatever evidence he has considered. When many items of evidence are involved, each has the effect of adjusting, in greater or lesser degree, the factfinder’s evaluation of the probability that the proposition before him is true. If this incremental process of cumulating evidence could be given quantitative expression, the factfinder might then be able to combine mathematical and non-mathematical evidence in a perfectly natural way, giving each neither more nor less weight than it logically deserves.”


He then explains that such a mathematical expression for updating probabilities in the light of new evidence exist: in simple cases, it is precisely Bayes’ theorem, and in more complicated situations with many different factors, this theorem can be expanded into the theory of Bayesian networks. Tribe gives a mathematical explanation of Bayes’ theorem (which we’ve already explained in previous posts: as for Bayesian networks, we’ll be dealing with them in a series of future posts). Tribe goes on to give a few examples of applications of Bayes’ theorem to legal cases.


Suppose that a juror estimates the probability of guilt of a defendant during a trial, in light of all the evidence seen so far, as about 2/3. Then new evidence comes up to the effect that after the crime was committed, the defendant took the first plane out of town. The juror makes a guess that the probability of a guilty criminal taking the first flight out of town is maybe 0.2, whereas the probability that an innocent person might do so (in order to distance themselves from the crime, for example) might be about 0.1. A direct application of Bayes’ theorem then has the effect of updating the juror’s probability of guilt up to 4/5.


Shortly before Tribe’s article, law professor Michael Finkelstein and statistician William Fairley coauthored an article, “A Bayesian approach to identification evidence”, 83 Harvard Law Review 1970, in which they described a similar type of scenario. Here, a woman’s body is found in a ditch in an urban area. There is evidence that the deceased quarreled violently with her boyfriend the night before and that he struck her on other occasions. A palm print similar to the defendant’s is found on the knife that was used to kill the woman. However, experts can only say that such prints can belong to no more than one person in a thousand.


The question is, how should this figure be incorporated into the jury’s assessment of the defendant’s guilt? The figure of 1 in 1000 can be quite misleading in itself, and is often confused with the probability of the defendant’s innocence. In fact, it does not have much meaning taken out of context; the important thing is to know the size of the population of possible murderers. For example, if nothing is known of the murderer, then all males in the area could be considered possible suspects; if the male population in the area is 1 million, then 1000 people can possess a hand that can leave a palmprint of the type found, which does not go a long way to identifying the boyfriend as the perpetrator. If more is known about the suspect, this can narrow down the relevant population and make the 1 in 1000 figure more telling.


Finkelstein and Fairley observe that if the boyfriend did use the knife to kill the woman, then he almost certainly left the print there, and conversely, that if he was not the one who used it, there would be only one chance in a thousand that a print similar to his would be found on the knife. They then use Bayes’ Theorem to calculate the updated probability of the defendant’s innocence for various values of the initial probability X that jurors might have in their minds before seeing the print evidence.


Bayes’ theorem leads to the following updates:


Probability before print evidence ................ Updated probability using print evidence

0.01 .................................................................................... 0.909

0.25 .................................................................................... 0.997

0.75 .................................................................................... 0.9996


So, for instance, the use of Bayes’ theorem means that a juror who only believes that there is about one chance in four that the defendant killed his girlfriend should revise his belief to the near-certainty of 99.7% after learning of the palm print evidence.


There is no doubt that a numerical approach like this to quantifying the importance of new evidence can be useful and surprising, especially if it is used in a manner that respects the subjective appreciation of each jury member, by giving a range of different possibilities for the original, prior estimations of guilt, before introduction of that evidence which has a numerical value (frequency of the palm print).


In spite of this, Tribe expresses doubt about the usefulness of these methods. “In the next section…we examine the costs we must be prepared to incur if we would follow the path Finkelstein and Fairley propose. What will presently be identified as certain costs of quantified methods of proof might conceivably be worth incurring if the benefit in increased trial accuracy were great enough. It turns out, however, that mathematical proof, far from providing any clear benefit, may in fact decrease the likelihood of accurate outcomes.”


We (the authors of this blog) believe that while the use of mathematical methods in trials is full of danger, above all the danger of mathematics being misused by non-experts and the danger of even correct mathematics being misunderstood by judges and jurors, there is nevertheless a great deal that can be done in the way of making sure that mathematical methods are applied correctly in the courtroom, and yield improved and more accurate outcomes.


But in order to make a really deep investigation of the subject, we first need to explore these dangers in depth, as well as the most cogent theoretical reasons against it – those expressed by Tribe.


Our next and last post on the subject of Tribe’s article will explain these. They make a lot of sense and should be taken seriously; in fact, they are so right and well-expressed that they convinced large numbers of people for three decades to keep mathematics out of the courtroom. Tribe’s ideas are not wrong. But we believe that it is possible to move beyond the problems he points out, by a carefully designed and controlled use of mathematics in trial.

No comments:

Post a Comment