Statistical Thinking 1

Several years ago, an NPR reporter wanted a comment from me for his story about an unusual event: a woman had won a state lottery jackpot for a second time. Winning once was low enough odds, but winning twice? The reporter found Statistics.com on the web, and wanted to record a statement from me about how improbable this was. As it was, I happened to have a little book What the Odds Are, full of odd facts about odds (“nearly three quarters of all Americans are mentioned in the media at least once in their life”).

I perused the book and found, to my surprise, that the odds that a lottery jackpot will be won twice by the same person are strikingly high. In fact, it has happened: Evie Marie Adams won the New Jersey state lottery jackpot twice within a four-month span, and there have been other cases as well.

1. One-Off Probability, Versus “Somewhere, Sometime…”

Two completely different probabilities are involved in the reporter’s question, depending on how it is framed. One is the extremely low probability that a given person, say Tom Smith, will win the lottery jackpot twice if he plays the lottery twice. Another is the much higher probability that, of all the times the lottery is played, by anyone, during some time span, say one year, a jackpot will be won twice by the same person. The former is like asking one person to flip a coin 15 times: the probability that they will get 15 heads is extremely low (0.00003). The latter is like asking all 50,000 people in Yankee Stadium to each flip a coin 15 times: the probability that at least one person will get 15 heads is 78% (it’s one minus the probability that nobody will get 15 heads, or 1 – (1 – 0.5^15)^50000).

I reported this to the NPR journalist, but it interested him not in the least. Either he did not understand what I was trying to explain, or he was simply trying to fill in the last sound bite on his story and did not want to be distracted. The only bit of my interview that made it on air was a sentence fragment about the odds of winning the lottery twice being equivalent to the odds of being hit by a meteorite. This must have been something else I pulled out of the little book, an intermediate step in trying to explain the much higher odds of “someone, somewhere…”

A statistical purist might have said “well, the woman won the lottery twice, so the probability is 1,” or, more likely, might have argued that the question was about the past, not about a future event, so the idea of probability is irrelevant. Regardless of the technical definition of probability, though, people make decisions all the time based on probability estimates. Or, to put it more accurately, they make decisions that implicitly embody probability estimates that they often fail to explicitly recognize and calculate. After the second 737-MAX crash, Boeing’s decision to continue producing the ill-fated and grounded plane for another 9 months reflected a significant overestimate of the probability of getting the plan back in the air in time to benefit from the continued production.

2. The Black Swan

The financial securities industry provides well-known examples of rare events that have proved to be less improbable than thought. The popular author Nassim Taleb wrote about this in his 2001 book Fooled by Randomness, which focused on the financial industry and its tendency to pay undue attention to random noise, and to its propensity to get blindsided by major changes. His second book in the same vein, The Black Swan, came on the eve of the great financial crisis of 2007. Taleb focused on outliers: unexpected events that came seemingly out of nowhere to disrupt life in a major way.

The term “black swan” came from the belief that black swans did not exist, a belief that Taleb said persisted until a black swan was seen in Australia. Taleb noted in a second edition that, since he published his book, travelers have been sending him pictures of black swans that they have encountered, proving that they were not as rare as originally thought.

One of the (several) contributing causes of the 2008 financial collapse was financial models that were built on standard statistical distributions like the normal distribution, in which the probability of extreme events was underestimated. The extreme event in question was a rapidly unfolding series of mortgage defaults, which brought about the collapse of a whole sector of derivative instruments that was built on mortgages. The impact of extreme events like this is so great that it does not take much of an under-estimate to thoroughly undercut any model they are based on, and their occurrence is so rare that it is difficult to accurately estimate their probability.

3. Exaggerating the Probability of Rare but Vivid Events

The above two examples involve underestimating the probability of rare events; there is also a problem of overestimating and overweighting the probability of rare and vivid events. When swimming or surfing in the ocean, whether in California or Cape Cod, fear of sharks ranks high in many people’s minds. When driving to work or to do errands, by contrast, not many people are consumed by fear of an auto accident. In the U.S., though, there are roughly 6 million auto accidents per year, with over 35,000 deaths. Shark attacks? About 16 per year, averaging less than one death per year.

Some psychologists have argued that the emotional content of “shark” scenarios renders them less amenable to cool calculation of probabilities. Daniel Kahneman, in Thinking, Fast and Slow, hypothesizes that it is not so much emotion, but rather the accompanying rich and vivid mental detail that adds complexity and reduces the role of pure probability in decisions. This additional detail might be intensely emotional, or it might not – all it needs to do is to distract the mind from the calculation.

4. Rare Events in Statistical Models

Rare events pose some complications in statistical and machine learning models. A key step in statistical modeling is assessing a model and diagnosing its failures. When an outcome being predicted is rare, say a fraudulent insurance claim, or a purchase on a website selling expensive items, a model can be quite accurate simply by predicting that all insurance claims are legitimate, or all web visitors will depart without purchasing. Hence the popularity of metrics other than overall accuracy, metrics that account for a model’s ability to sift through the cases and efficiently identify the ones of interest. Two such metrics are:

Lift. After sorting records by their predicted probability of being of interest (the rare ones), or their predicted value (if a continuous outcome), select, say, the top 10%. Lift (also called gains) measures how much better the model does in identifying the cases of interest than would simply selecting randomly.

AUC, or “Area Under the Curve”. The curve in question is the Receiver Operating Characteristics, or ROC curve. As with lift, records are arrayed in order of predicted probability of being of interest. Starting in the lower left, the curve plots sensitivity (percent of positives identified) versus 1-specificity (specificity is the percent of negatives identified). The area under the curve, or AUC, measures the effectiveness of the model. The greater the area (the more the curve hugs the upper left), the greater the ability of the model to distinguish positives from negatives.

AUC is favored by data science practitioners because it sums up, in a single measure, the overall performance of the model. Lift, on the other hand, has the virtue of being able to hone in on the records of most interest (e.g. the 10% most likely to be 1’s).

For more on these metrics, read this Word of the Week on ROC, Lift and Gains Curves.

Conclusion

It is no surprise to most statisticians that people often do not think clearly about probabilities. Probability is somewhat of an abstract concept, which hinders its application to concrete decisions. People respond differently when you talk about numbers, not about probabilities, proportions or percentages. Here’s an example cited by Kahneman:

An experiment was conducted in which psychologists were asked to evaluate whether Mr. Jones, with a history of violence, should be discharged from a mental institution. As part of the the evaluation, they reviewed a supposed expert’s risk assessment in one of two forms:

“Patients similar to Mr. Jones are estimated to have a 10% probability of committing an act of violence against others during the first several months after discharge.”

“Of every 100 patients similar to Mr. Jones, 10 are estimated to commit an act of violence against others during the first several months after discharge.”

Psychologists who saw #2, the concrete numbers, were twice as likely to deny the discharge as those who saw the problem framed in probabilities (41% versus 21%). When it comes to decisions, the concrete and the vivid come out ahead when estimating likelihoods.