Maths on Trial: Laurence Tribe: Maths on Trial (6)

Tribe’s reaction to the use of Bayes’ theorem at trial

We finally got to the heart of Tribe’s article:

Fourthly, an emotional and deeply human explanation of his final decision to recommend the avoidance of mathematical methods altogether in the area of criminal law.

In this second to last post of the series, we want to explain Tribe’s objections to the use of Bayes’ theorem at trial, in examples such as the one given by Finkelstein and Fairley, because if we intend to support the use of Bayes’ theorem and Bayesian networks (under very specific conditions and criteria which are still to be developed), we need to understand the major objections first.

He gives four objections, described in detail. Today we discuss the first of the four.

1…The Distortion of Outcomes

Tribe points out the difficulty of settling on a prior probability of guilt of the defendant, before using Bayes’ theorem to update this probability in the light of new, numerical evidence. According to Tribe, “Because the Finkelstein-Fairley technique thus compels the jury to begin with a number of the most dubious value, the use of that technique at trial would be very likely to yield wholly inaccurate, and misleadingly precise, conclusions.”

We don’t believe that this is a serious problem, however, for the following important reason: if the input to Bayes is not a known statistical figure but merely a subjective evaluation, then a wide range of different possibilities should be input. If the outcomes re then very different, one can discard the use of Bayes in that particular situation. But it can happen that in spite of very different inputs, the outcomes are all quite similar, just as it turned out in Finkelstein and Fairley’s example. What this means is that the numerical evaluation of the knife palm print evidence is actually more indicative of guilt than one might intuitively believe.

Tribe also claims that in court, a figure like “one in a thousand”, and a table like that given by Finkelstein and Fairley, could unduly impress a jury. “The problem with the overpowering number, that one hard piece of information, is that it may dwarf all efforts to put it into perspective with more impressionistic sorts of evidence,” and “The problem – that of the overbearing impressiveness of numbers – pervades all cases in which the trial use of mathematics is proposed. And, whenever such use is in fact accomplished by methods resembling those of Finkelstein and Fairley, the problem becomes acute.” He particularly warns against this happening when the numerical evidence is not connected to the specific case at hand, like the palm print, but concerns a general situation of which the case at hand is just one instance, like the barrel falling out of the window and the information that 60% of the time such an incident is caused by a negligent act.

We believe that this kind of information, which could be used to help a jury fix its prior assumption of guilt, not to update it, is not suitable for courtroom use, as it says nothing about the particular case at hand. And we think that the type of numerical information that does pertain to the case at hand, such as in Finkelstein-Fairley’s example, will impress the jury in a reasonable manner; we see no reason to believe that the jury might be overpowered, if the matter is presented in a reasonable and non-dramatic manner, and if the use of Bayes’ theorem eventually becomes a common and well-recognized occurrence in court.

Tribe warns against the attempt to simplify events in order to apply Bayes’ theorem more fittingly. He gives as an example the fact that Finkelstein and Fairley assumed that the defendant would leave a palm print on the knife if he used it to kill, ignoring the possibility that he may have worn gloves, or that he might have wiped off his prints, or even that the perpetrator being someone other than the boyfriend might have left a smudged version of his different print, which smudging strangely resembled the boyfriend’s print. He adds other forgotten possibilities for the palm print, such as, for example, that the defendant left his own palm print during an innocent use of the knife, which was subsequently used by someone wearing gloves to perform the killing; someone who left the palm print either because he did not see it, or with the conscious intention to frame the defendant. “Finkelstein and Fairley overlook the risk of frame-up altogether – despite the nasty fact that the most inculpatory item of evidence may be the item most likely to be used to frame an innocent man.”

All of these are highly unlikely events, but they do have non-zero probabilities, and Tribe considers that a result obtained by forgetting them all carries a real risk of being in error. We agree with this, particularly with the frame-up theory, which was also so remarkably forgotten in the case of Joe Sneed. Tribe adds that including all these possibilities into the presentation of Bayes’ theorem would make the formula extremely complicated and unrealistic to use, especially as all the probabilities of these events would have to be estimated.

Our response to this argument is that a Bayesian network would be more suited to the complicated situation than Bayes’ theorem; but in general, it is best to use Bayesian networks only when a fairly wide range of reasonable input probabilities can go in, whereas the outcomes point clearly in a certain direction. It would be worth making the attempt on the Finkelstein Fairley example to see whether it is such a case. Such experiments could be made before a case was actually being tried in court, during the pretrial investigations, and Bayes would not be introduced into court at all unless the use of Bayes turned out to be convincing. This approach should solve the objection that Tribe expressed in the following words: “It simply does not follow that trial accuracy will be enhanced if some of the important variables are quantified and subjected to Bayesian analysis, leaving the softer ones – those to which meaningful numbers are hardest to attach – in an impressionistic limbo.”

Tribe also objects that even if the jury is convinced by the figures that the defendant held the knife and stabbed his girlfriend, they still have to take into account the state of mind of the defendant during the act before deciding whether he is actually guilty of murder. “One consequence of mathematical proof, then, may be to shift the focus away from such elements as volition, knowledge and intent, and toward such elements as identity and occurrence – for the same reason that the hard variables tend to swap the soft.”

We are not convinced by this argument. We trust the defendant’s counsel to raise the question of his mental state in front of the jury.

Finally, Tribe points out that the jury must be absolutely ignorant of the new, numerical piece of evidence until the moment when it is introduced at trial together with the Bayesian calculation. Otherwise, if they have heard anything of it beforehand, they will have already factored it into their estimation of the prior probability of guilt, and its force will be used unfairly twice over.

This is a good argument. It is important to ensure that the jury does not hear about the new evidence until it appears within the Bayesian framework. For this reason, it may be necessary to exclude certain pieces of evidence which hint at the specific piece of numerical evidence, right up until that point.

Tribe also gives an example where Bayes theorem can give a seriously wrong result. This example is very simple: a robbery that took 15 minutes was committed in a certain place between 3:00 and 3:30 a.m., and the defendant was seen in a car a ½-mile from the scene at 3:10 a.m. Given this information, the jury will come to a prior probability of X that the defendant is guilty. Then the new piece of information is brought, saying that the defendant was seen in a car a ½-mile from the scene at 3:20 a.m. If the probability that, being guilty, this was the case, is calculated and used to update the prior probability, it will increase the jury’s conviction of the defendant’s guilt. But in fact, the two times taken together show that the defendant must be innocent!

It seems to us that it should not be difficult to avoid such obvious traps when designing a Bayesian network to deal with the facts in a specific case. But we do appreciate Tribe’s intelligence and sense of humour in even inventing this example.

Maths on Trial

Pages

Friday, 27 January 2012

Laurence Tribe: Maths on Trial (6)

No comments:

Post a Comment