Sunday, 29 January 2012

Laurence Tribe: Maths on Trial (7)

Tribe’s reaction to the use of Bayes’ theorem at trial


In this last post of the series, we continue to summarize Tribe’s objections to the use of Bayes’ theorem at trial, with our responses. Last time we discussed his first objection:


1….The Distortion of Outcomes


Today, we continue with the other three objections.


2….The End of Innocence: A Presumption of Guilt?


“At least in criminal cases, and perhaps also in civil cases resting on allegations of moral fault, further difficulties lurk in the very fact that the trier is forced by the Finkelstein-Fairley technique to arrive at an explicit quantitative estimate of the likely truth at or near the trial’s start, or at least before some of the most significant evidence has been put before him.” Tribe’s argument in this section is that the idea of forcing a juror to arrive at some kind of mathematical figure for the prior probability of guilt, before applying Bayes’ theorem to update that prior according to some new piece of numerical evidence, is fundamentally against the presumption of guilt, and that this injustice is not rectified even by setting the prior probability to an unreasonably low value.


Tribe stresses the importance of the juror listening to all of the evidence before “reaching any judgment, even a tentative one, as to his probable guilt”; he terms this one of the “intangible aspects” of our commitment to the proposition that “a man who stands accused of crime is no less entitled than his accuser to freedom and respect as an innocent member o the community”. Tribe admits that in reality, a juror listening to a trial may not actually be considering the accused as certainly innocent until a complete proof has been laid out before him; he may, in fact, swing backwards and forwards in his estimate of guilt as the trial proceeds, thus holding some vague idea of “prior probability of guilt” at all times. However, according to Tribe, these impulses must not be spoken or expressed, let alone called to the fore, out of respect for the presumption of innocence. “Society ought to speak of accused men as innocent, and treat them as innocent, until they have been properly convicted after all they have to offer in their defense has been carefully weighed,” and “Jurors cannot at the same time estimate probable guilt and suspend judgment until they have heard all the defendant has to say.”


We feel that the proper method to counter Tribe’s worries and to allow jurors to keep all their thoughts about possible guilt, and their personal, intimate estimates of probability of guilt, silent and unspoken throughout the trial, is the presentation of a table such as the one given by Finkelstein and Fairley, in which many different values for the prior probability of guilt are entered, and the update according to the new evidence is calculated for each of them. In this way, the jury members are never required to explicitly formulate a probability of guilt at any stage; it is enough for them to place themselves loosely within the table, without being asked to specify where, or even to consider the entire set of output results and the meaning they indicate before coming to any final decision. This would certainly respect the duty of silence that Tribe describes.


We note as an aside that Tribe himself appears disturbed by the fact that the duty of silence is at the same time a call for lack of candor. In a few well-chosen words he expresses his obviously deeply held feeling that while a lack of candor should not be something that is required lightly, there are cases in which it serves a higher moral purpose. Lies or refusals to consider the full weight of the evidence are not required, merely a silence on the subject in respect for the presumption of innocence: “One need not say everything all at once in order to be truthful, and saying some things in certain ways and at certain times in the trial process may interfere with other more important messages that the process should seek to convey and with attitudes that it should seek to preserve.”


We respect and agree with this approach, but as explained above, Bayes’ theorem can be correctly presented and used at trial without undermining it.



3…The Quantification of Sacrifice


One extremely disturbing factor about the use of any mathematical method to determine guilt is that since the final probability, when all the evidence has been taken into account, is rarely likely to be equal to 1.0, it will, in cases of conviction, tend to be some concrete and convincing figure such as, for example, 0.98. Unfortunately, accepting such a figure as the probability of guilt can also be expressed as saying that one accepts that 2 people out of every 100 convicted are expected to be innocent. While there is no doubt that many innocent people are convicted each year, it is indeed disturbing to have an actual figure of the number of such people that one expects. “There is something intrinsically immoral about condemning a man as a criminal while telling oneself, “I believe that there is a chance of one in twenty that this defendant is innocent, but a 1/20 risk of sacrificing him erroneously is one I am willing to run in the interest of the public’s – and my own – safety.” Tribe wishes for any justice system to be structured in such a way as to avoid ever having to make such a proclamation with a specifically published figure. He considers that it is morally superior for a juror to express himself as being “very sure” or “as sure as possible” of guilt than to give an actual figure, which would prove his readiness to accept such or such a percentage of innocent sacrificial victims.


Naturally, there are miscarriages of justice in any justice system, but Tribe points out the vast moral difference between society’s recognizing the necessity of tolerating them, and the fact of its actually embracing a policy that juries “ought to convict in the face of this acknowledged and quantified uncertainty”. He prefers sticking to the notion of “guilt beyond a reasonable doubt”, which represents “a subtle compromise between the knowledge, on the one hand, that we cannot realistically insist on acquittal whenever guilt is less than absolutely certain, and the realization, on the other hand, that the cost of spelling that out explicitly and with calculated precision in the trial itself would be too high.”


Our response to this argument is that Bayes’ theorem should not, probably ever, be used to compute the probability of guilt. It should be brought in to establish certain factual subsidiary questions, such as the probability of its being the defendant or another person who left a certain print, was seen in a certain place, was present at a certain time, transported a certain object, and so forth. The actual final decision of innocence or guilt should and must be made by jurors without recourse to any numerical calculation.



4…The Dehumanization of Justice


Tribe’s final argument against the use of mathematical methods at trial is simply that they “threaten to make the legal system seem even more alien and inhuman than it already does to distressingly many…The need now is to enhance the community comprehension of the trial process, not to exacerbate an already serious problem by shrouding the process in mathematical obscurity.” Tribe worries that “guided and perhaps intimidated by the seeming inexorability of numbers, induced by the persuasive force of formulas and the precision of the decimal points to perceive themselves as performing a largely mechanical and automatic role, few jurors could be relied upon to recall, let alone to perform, this humanizing function, to employ their intuition and their sense of community values to shape their ultimate conclusions.”


Our response to this is that with the advent of DNA analysis, mathematics at trial is here to stay. It is perhaps not yet fully understood that the statistical analyses used on DNA are no different than many other cases and situations where Bayes’ theorem can be applied. The public has grown used to seeing DNA analyses presented at trial over the decades since Tribe wrote his article, and juries deal with the situation competently enough in the main, having expert witnesses explain the issues to them in layman’s terms, and not, generally, forgetting to employ their common sense.


It seems to us that what is needed is a general education aimed at the public, so that little by little, the notions used there, which are not more difficult than much of the mathematics seen at school, become familiar and trustworthy to the public at large, from which juries are drawn. We believe that this is the only way in which Tribe’s profound moral and social concerns can be reconciled with the fact that mathematics at trial is here to stay. We also believe that such a general public education is a legitimate and reasonable aim to work towards, which is the whole purpose of this blog.

Friday, 27 January 2012

Laurence Tribe: Maths on Trial (6)

Tribe’s reaction to the use of Bayes’ theorem at trial


We finally got to the heart of Tribe’s article:


Fourthly, an emotional and deeply human explanation of his final decision to recommend the avoidance of mathematical methods altogether in the area of criminal law.


In this second to last post of the series, we want to explain Tribe’s objections to the use of Bayes’ theorem at trial, in examples such as the one given by Finkelstein and Fairley, because if we intend to support the use of Bayes’ theorem and Bayesian networks (under very specific conditions and criteria which are still to be developed), we need to understand the major objections first.


He gives four objections, described in detail. Today we discuss the first of the four.


1…The Distortion of Outcomes


Tribe points out the difficulty of settling on a prior probability of guilt of the defendant, before using Bayes’ theorem to update this probability in the light of new, numerical evidence. According to Tribe, “Because the Finkelstein-Fairley technique thus compels the jury to begin with a number of the most dubious value, the use of that technique at trial would be very likely to yield wholly inaccurate, and misleadingly precise, conclusions.”


We don’t believe that this is a serious problem, however, for the following important reason: if the input to Bayes is not a known statistical figure but merely a subjective evaluation, then a wide range of different possibilities should be input. If the outcomes re then very different, one can discard the use of Bayes in that particular situation. But it can happen that in spite of very different inputs, the outcomes are all quite similar, just as it turned out in Finkelstein and Fairley’s example. What this means is that the numerical evaluation of the knife palm print evidence is actually more indicative of guilt than one might intuitively believe.


Tribe also claims that in court, a figure like “one in a thousand”, and a table like that given by Finkelstein and Fairley, could unduly impress a jury. “The problem with the overpowering number, that one hard piece of information, is that it may dwarf all efforts to put it into perspective with more impressionistic sorts of evidence,” and “The problem – that of the overbearing impressiveness of numbers – pervades all cases in which the trial use of mathematics is proposed. And, whenever such use is in fact accomplished by methods resembling those of Finkelstein and Fairley, the problem becomes acute.” He particularly warns against this happening when the numerical evidence is not connected to the specific case at hand, like the palm print, but concerns a general situation of which the case at hand is just one instance, like the barrel falling out of the window and the information that 60% of the time such an incident is caused by a negligent act.


We believe that this kind of information, which could be used to help a jury fix its prior assumption of guilt, not to update it, is not suitable for courtroom use, as it says nothing about the particular case at hand. And we think that the type of numerical information that does pertain to the case at hand, such as in Finkelstein-Fairley’s example, will impress the jury in a reasonable manner; we see no reason to believe that the jury might be overpowered, if the matter is presented in a reasonable and non-dramatic manner, and if the use of Bayes’ theorem eventually becomes a common and well-recognized occurrence in court.


Tribe warns against the attempt to simplify events in order to apply Bayes’ theorem more fittingly. He gives as an example the fact that Finkelstein and Fairley assumed that the defendant would leave a palm print on the knife if he used it to kill, ignoring the possibility that he may have worn gloves, or that he might have wiped off his prints, or even that the perpetrator being someone other than the boyfriend might have left a smudged version of his different print, which smudging strangely resembled the boyfriend’s print. He adds other forgotten possibilities for the palm print, such as, for example, that the defendant left his own palm print during an innocent use of the knife, which was subsequently used by someone wearing gloves to perform the killing; someone who left the palm print either because he did not see it, or with the conscious intention to frame the defendant. “Finkelstein and Fairley overlook the risk of frame-up altogether – despite the nasty fact that the most inculpatory item of evidence may be the item most likely to be used to frame an innocent man.”


All of these are highly unlikely events, but they do have non-zero probabilities, and Tribe considers that a result obtained by forgetting them all carries a real risk of being in error. We agree with this, particularly with the frame-up theory, which was also so remarkably forgotten in the case of Joe Sneed. Tribe adds that including all these possibilities into the presentation of Bayes’ theorem would make the formula extremely complicated and unrealistic to use, especially as all the probabilities of these events would have to be estimated.


Our response to this argument is that a Bayesian network would be more suited to the complicated situation than Bayes’ theorem; but in general, it is best to use Bayesian networks only when a fairly wide range of reasonable input probabilities can go in, whereas the outcomes point clearly in a certain direction. It would be worth making the attempt on the Finkelstein Fairley example to see whether it is such a case. Such experiments could be made before a case was actually being tried in court, during the pretrial investigations, and Bayes would not be introduced into court at all unless the use of Bayes turned out to be convincing. This approach should solve the objection that Tribe expressed in the following words: “It simply does not follow that trial accuracy will be enhanced if some of the important variables are quantified and subjected to Bayesian analysis, leaving the softer ones – those to which meaningful numbers are hardest to attach – in an impressionistic limbo.”


Tribe also objects that even if the jury is convinced by the figures that the defendant held the knife and stabbed his girlfriend, they still have to take into account the state of mind of the defendant during the act before deciding whether he is actually guilty of murder. “One consequence of mathematical proof, then, may be to shift the focus away from such elements as volition, knowledge and intent, and toward such elements as identity and occurrence – for the same reason that the hard variables tend to swap the soft.”


We are not convinced by this argument. We trust the defendant’s counsel to raise the question of his mental state in front of the jury.


Finally, Tribe points out that the jury must be absolutely ignorant of the new, numerical piece of evidence until the moment when it is introduced at trial together with the Bayesian calculation. Otherwise, if they have heard anything of it beforehand, they will have already factored it into their estimation of the prior probability of guilt, and its force will be used unfairly twice over.


This is a good argument. It is important to ensure that the jury does not hear about the new evidence until it appears within the Bayesian framework. For this reason, it may be necessary to exclude certain pieces of evidence which hint at the specific piece of numerical evidence, right up until that point.


Tribe also gives an example where Bayes theorem can give a seriously wrong result. This example is very simple: a robbery that took 15 minutes was committed in a certain place between 3:00 and 3:30 a.m., and the defendant was seen in a car a ½-mile from the scene at 3:10 a.m. Given this information, the jury will come to a prior probability of X that the defendant is guilty. Then the new piece of information is brought, saying that the defendant was seen in a car a ½-mile from the scene at 3:20 a.m. If the probability that, being guilty, this was the case, is calculated and used to update the prior probability, it will increase the jury’s conviction of the defendant’s guilt. But in fact, the two times taken together show that the defendant must be innocent!


It seems to us that it should not be difficult to avoid such obvious traps when designing a Bayesian network to deal with the facts in a specific case. But we do appreciate Tribe’s intelligence and sense of humour in even inventing this example.

Thursday, 26 January 2012

Laurence Tribe: Maths on Trial (5)

Tribe’s summary of Bayes’ Theorem in law


We’ve been describing the contents of Laurence Tribe’s seminal article on mathematics at trial. The first two parts we discussed were:


Firstly, a description of the kind of use of mathematics at trial that he is specifically going to discuss, with examples;


Secondly, a review of the traditional arguments that judges have used against mathematics at trial, together with Tribe’s reaction to these arguments;


Today we’re going to summarize the next part:


Thirdly, a sketch of the introduction of Bayes’ theorem at trial as introduced by Finkelstein and Fairley.


Tribe starts by giving an informal introduction to the relevance of Bayes’ theorem: “In deciding a disputed proposition, a rational factfinder probably begins with some initial, a priori estimate of the likelihood of the proposition’s truth, then updates his prior estimate in light of discoverable evidence bearing on that proposition, and arrives finally at a modified assessment of the proposition’s likely truth in light of whatever evidence he has considered. When many items of evidence are involved, each has the effect of adjusting, in greater or lesser degree, the factfinder’s evaluation of the probability that the proposition before him is true. If this incremental process of cumulating evidence could be given quantitative expression, the factfinder might then be able to combine mathematical and non-mathematical evidence in a perfectly natural way, giving each neither more nor less weight than it logically deserves.”


He then explains that such a mathematical expression for updating probabilities in the light of new evidence exist: in simple cases, it is precisely Bayes’ theorem, and in more complicated situations with many different factors, this theorem can be expanded into the theory of Bayesian networks. Tribe gives a mathematical explanation of Bayes’ theorem (which we’ve already explained in previous posts: as for Bayesian networks, we’ll be dealing with them in a series of future posts). Tribe goes on to give a few examples of applications of Bayes’ theorem to legal cases.


Suppose that a juror estimates the probability of guilt of a defendant during a trial, in light of all the evidence seen so far, as about 2/3. Then new evidence comes up to the effect that after the crime was committed, the defendant took the first plane out of town. The juror makes a guess that the probability of a guilty criminal taking the first flight out of town is maybe 0.2, whereas the probability that an innocent person might do so (in order to distance themselves from the crime, for example) might be about 0.1. A direct application of Bayes’ theorem then has the effect of updating the juror’s probability of guilt up to 4/5.


Shortly before Tribe’s article, law professor Michael Finkelstein and statistician William Fairley coauthored an article, “A Bayesian approach to identification evidence”, 83 Harvard Law Review 1970, in which they described a similar type of scenario. Here, a woman’s body is found in a ditch in an urban area. There is evidence that the deceased quarreled violently with her boyfriend the night before and that he struck her on other occasions. A palm print similar to the defendant’s is found on the knife that was used to kill the woman. However, experts can only say that such prints can belong to no more than one person in a thousand.


The question is, how should this figure be incorporated into the jury’s assessment of the defendant’s guilt? The figure of 1 in 1000 can be quite misleading in itself, and is often confused with the probability of the defendant’s innocence. In fact, it does not have much meaning taken out of context; the important thing is to know the size of the population of possible murderers. For example, if nothing is known of the murderer, then all males in the area could be considered possible suspects; if the male population in the area is 1 million, then 1000 people can possess a hand that can leave a palmprint of the type found, which does not go a long way to identifying the boyfriend as the perpetrator. If more is known about the suspect, this can narrow down the relevant population and make the 1 in 1000 figure more telling.


Finkelstein and Fairley observe that if the boyfriend did use the knife to kill the woman, then he almost certainly left the print there, and conversely, that if he was not the one who used it, there would be only one chance in a thousand that a print similar to his would be found on the knife. They then use Bayes’ Theorem to calculate the updated probability of the defendant’s innocence for various values of the initial probability X that jurors might have in their minds before seeing the print evidence.


Bayes’ theorem leads to the following updates:


Probability before print evidence ................ Updated probability using print evidence

0.01 .................................................................................... 0.909

0.25 .................................................................................... 0.997

0.75 .................................................................................... 0.9996


So, for instance, the use of Bayes’ theorem means that a juror who only believes that there is about one chance in four that the defendant killed his girlfriend should revise his belief to the near-certainty of 99.7% after learning of the palm print evidence.


There is no doubt that a numerical approach like this to quantifying the importance of new evidence can be useful and surprising, especially if it is used in a manner that respects the subjective appreciation of each jury member, by giving a range of different possibilities for the original, prior estimations of guilt, before introduction of that evidence which has a numerical value (frequency of the palm print).


In spite of this, Tribe expresses doubt about the usefulness of these methods. “In the next section…we examine the costs we must be prepared to incur if we would follow the path Finkelstein and Fairley propose. What will presently be identified as certain costs of quantified methods of proof might conceivably be worth incurring if the benefit in increased trial accuracy were great enough. It turns out, however, that mathematical proof, far from providing any clear benefit, may in fact decrease the likelihood of accurate outcomes.”


We (the authors of this blog) believe that while the use of mathematical methods in trials is full of danger, above all the danger of mathematics being misused by non-experts and the danger of even correct mathematics being misunderstood by judges and jurors, there is nevertheless a great deal that can be done in the way of making sure that mathematical methods are applied correctly in the courtroom, and yield improved and more accurate outcomes.


But in order to make a really deep investigation of the subject, we first need to explore these dangers in depth, as well as the most cogent theoretical reasons against it – those expressed by Tribe.


Our next and last post on the subject of Tribe’s article will explain these. They make a lot of sense and should be taken seriously; in fact, they are so right and well-expressed that they convinced large numbers of people for three decades to keep mathematics out of the courtroom. Tribe’s ideas are not wrong. But we believe that it is possible to move beyond the problems he points out, by a carefully designed and controlled use of mathematics in trial.

Wednesday, 25 January 2012

Laurence Tribe: Maths on Trial (4)

Tribe’s reaction to standard objections to math on trial


In the last post we gave a description of the first part of Tribe’s seminal article:


Firstly, a description of the kind of use of mathematics at trial that he is specifically going to discuss, with examples.


Today we will summarize his arguments in the second part.


Secondly, a review of the traditional arguments that judges have used against mathematics at trial, together with Tribe’s reaction to these arguments;


He lists objections that have been made by judges to the introduction of statistical evidence at trial, some of which has actually been written into law.


Objection 1: At first glance, probability concept might appear to have no application in deciding precisely what did or did not happen on a specific prior occasion: either it did or it didn’t – period.


Tribe’s reaction: Although this is true in itself, the statistical knowledge can be very useful in cases where it is used in conjunction with sufficient further information.


Objection 2: Making use of the mathematical information available first requires transforming it from evidence about the generality of cases to evidence about the particular case; some feel that no such translation is possible.


Tribe’s reaction: this kind of information is important for the trier of fact to come to a decision about the likelihood of certain events, for instance the “4/5” probability that the blue bus that hit the plaintiff belonged to the defendant who was responsible for operating 4/5 of the blue buses in town.


Objection 3: In very few cases, if any, can the mathematical evidence, taken alone and in the setting of a completed lawsuit, establish the proposition to which it is directed with sufficient probative force to prevail.


Tribe’s reaction: But the fact that mathematical evidence taken alone can rarely, if ever, establish the crucial proposition with sufficient certitude to meet the applicable standard of proof does not imply that such evidence – when properly combined with other, more conventional, evidence in the same case – cannot supply a useful link in the process of proof…The real issue is whether there is any acceptable way of combining mathematical with non-mathematical evidence. If there is, mathematical evidence can indeed assume the role traditionally played by other forms of proof.




Now, it is a fact that many mathematicians and statisticians have proposed a way to integrate mathematics and traditional evidence, based on Bayes’ theorem. In fact, Tribe’s whole article actually was written as a response to an article on the use of Bayes’ theorem at trial, authored by Finkelstein and Fairley (we’ll investigate this and similar articles in future posts). In the third part of Tribe’s article, he briefly summarizes the point of view of those who advocate using Bayes’ theorem at trial. This will be the subject of the next post

Tuesday, 24 January 2012

Laurence Tribe: Maths on Trial (3)

Tribe’s reaction to probability at trial


The other main theme of Laurence Tribe’s article “Trial by Mathematics: Precision and Ritual in the Legal Process” (84 Harvard Law Review 1338 1970-71) is, to use his own words, “cases in which mathematical methods are turned to the task of deciding what occurred on a particular, unique occasion, as opposed to cases in which the very task defined by the applicable law is that of measuring the statistical characteristics or likely effects of some process or the statistical features of some population of people or events”.


This part of the article is divided into four main sections:


Firstly, a description of the kind of use of mathematics at trial that he is specifically going to discuss, with examples;


Secondly, a review of the traditional arguments that judges have used against mathematics at trial, together with Tribe’s reaction to these arguments;


Thirdly, a sketch of the introduction of Bayes’ theorem at trial as introduced by Finkelstein and Fairley;


Fourthly, the very heart of the article: an emotional and deeply human explanation of his final decision to recommend the avoidance of mathematical methods altogether in the area of criminal law.


Tribe’s opinions held such sway in the world of legal thinking that it has been said by those who strongly favour the properly thought out and properly controlled use of mathematics at trial that Tribe alone held back the development of that area of research for a good 30 years. If this is the case, then it seems worth exploring Tribe’s objections in some detail before going on to what we believe the future holds in the way of probability at trial.


Today’s post will summarize the first part of Tribe’s article: a description of the kind of

mathematics at trial that he is aiming to discuss.


Tribe divides the use of mathematics in deciding what occurred on a particular occasion into three distinct possibilities:


(1) determining whether an event did or did not occur,


(2) determining the identity of the individual responsible for certain acts,


(3) determining the intention behind certain acts.


For each type, he gives a few examples of the kind of problem that may arise. For (1), one example is that of a man who is accused of leaving his car at a parking place for over one hour in defiance of the rules. The witness is the officer who testified that twice, at times separated by over an hour, he observed that car in that particular place, and that he noted the precise position of the front and back tires. The car owner’s defense is that he drove away during the hour and then came back later, and his wheel positions happened to be the same as they were the first time by chance. The probability of this happening is between about 1/12 and 1/144. But what role should this probability play in judging the car owner’s innocence or guilt?


A second example is that of a barrel falling out of someone’s window onto another person’s head: the question is whether some negligent act was the cause of the fall. Supposing that it is statistically known that over 60% of such incidents are caused by a negligent act, should this fact be allowed in court?


For (2), he gives the example of a plaintiff negligently run over by a blue bus, who accuses defendant of negligence on the grounds that defendant is a bus operator who operates 4/5 of the blue buses in town. How important should that figure be in judging whether the blue bus in question did or did not belong to plaintiff?


Another example is that of a man found shot to death in his mistress’ apartment, the question being whether she shot him. There is evidence to prove that in 95% of all known cases in which a man is killed in his mistress’ apartment, the mistress is the killer. Is this evidence of sufficient relevance to be introduced in trial? Does it have any role to play?


Finally, for (3), Tribe gives the example of a recently insured building burning down, with the owner insisting that the fire was an accident. If it is statistically known that less than one fire in 20 that occurs shortly after an insurance purchase occurs purely by chance, what role should such a statistic play in the investigation?


Tribe points out that courts faced with the emergence of this kind of evidence during trials have tended to deal with it in an entirely ad hoc manner, sometimes supporting the mathematical proof proffered by a lawyer, more frequently judging it to be improper. But, he points out, “as the number and variety of cases continues to mount, the difficulty of dealing intelligently with them in the absence of any coherent theory is becoming increasingly apparent.”


He is obviously right, even if the “coherent theory” that he develops in the rest of the article, which we’ll cover in the next posts, is far from satisfying everybody!