Why Statisticians Like Odds - Statistics.com: Data Science, Analytics & Statistics Courses

In your introductory statistics class, probability took center stage. Odds were for gamblers. But it turns out odds play an important role in statistics, too. The relationship between the two is simple. To estimate the probability that event “A” will happen, we divide the number of times A occurs by the number of all events, A and not A (~A), or

A / (A + ~A)

The odds of A happening omit A from the denominator – they are estimated by A / ~A.

With low probabilities of A, the two are almost the same. Why use odds instead of probabilities? Two key applications come to mind.

1. Logistic Regression Uses Odds

Next to linear regression, logistic regression is the most useful member of the family of linear models. For decades it has been a powerful and efficient model for predicting the probability that an outcome will happen, and for understanding relationships between predictor variables and that outcome.

Unfortunately, a linear model is not well suited for predicting a probability, which does not keep increasing as predictor values increase. A probability must lie between 0 and 1 (you cannot have more than a 100% chance of something). Odds are not so constrained. Odds can take any positive value (e.g. a ⅔ probability is the same as odds of 2/1). If instead we use odds (actually the log of odds, or logit), a linear model can be fit.

The predicted odds of an outcome for a particular set of predictor values can readily be translated to a probability. Also, the coefficient for a predictor in a linear regression model, when exponentiated, can be interpreted as the change in odds of an outcome per unit change in that predictor.

2. Retrospective Medical Studies Use Odds

A fundamental tool in medical research is the prospective study. In a prospective study, subjects are followed over time and we observe whether they develop a disease. Typically, we want to compare the risk (probability) of getting the disease for subjects with some attribute or condition, versus the risk for those without that condition. For example, we might look at the risk of developing heart disease for smokers versus non-smokers. This comparison is typically expressed as a ratio – the heart disease risk for smokers divided by the heart disease risk for non-smokers. This ratio (the “relative risk”) tells you how much extra risk you are incurring by smoking.

Unfortunately, prospective studies are time-consuming – you have to wait long enough for the subjects to have adequate opportunity to contract the disease. An alternative is the case control retrospective study, in which you look at people who already have the disease along with matched controls who do not, and ask whether they have the condition of interest. For example, look at patients who have heart disease as well as a set of matched controls, and ask whether they are smokers or not.

Since equivalent numbers of heart disease patients and healthy patients are purposely selected for the study, this design cannot estimate the probability of getting the disease. Instead, these designs calculate the odds of the antecedent (being a smoker), not the outcome (heart disease). For example, we learn the odds that a heart disease patient will be a smoker, not the odds that a smoker will get heart disease. The odds of being a smoker can be divided by those of not being a smoker, resulting in the odds ratio for this risk factor.

This design has the disadvantage of not yielding information in the terms most relevant to patients -the increased probability of getting a disease by exposing yourself to some risk factor. However, it has the advantage of being able to use existing data and produce conclusions immediately upon analysis. For this reason, retrospective studies are popular due to their convenience and efficiency. As a result, medical professionals, and the statisticians who analyze their data, are well familiar with odds ratios.

Odds, Gambling, and Behavioral Research

Let’s examine gambling, or, more precisely, betting. Odds are integral to betting; they are the instrument by which gamblers and gambling houses translate estimated probabilities into bets. Odds in betting are phrased in terms of the payoff that a bet will yield. On a 36 number roulette wheel (excluding the house’s 0 or 0s), placing your bet on 9 numbers out of the 36 corresponds to 3 to 1 odds. This means that a $1 bet on a roulette wheel play will return you $3 if you win (this is sometimes phrased as a 2-1 payout: you get your original $1 back plus $2). This translates to an estimated probability of ¼ or 25% that the wheel will land on one of your segments. If your play wins 25% of the time (gaining you $3) and loses 75% of the time (losing you $1), then the 3 to 1 odds constitute a fair bet, meaning that you will come out even in the long run making such a bet. Gambling houses set betting odds slightly more advantageous to themselves than a fair bet, to earn inexorable profits over the long run. Put another way, the gambling house wants the probabilities suggested by the odds for all the outcomes to sum to more than 1. In roulette, this is done by adding a 0 to the 36 numbers (Europe) or two 0s (United States), which belong to the house: if the wheel lands on 0 nobody wins and the house collects all the bets. In competitive and transparent gambling games, this house margin is thin: make it too big and that would leave the door open to a competitor offering better odds.

The potential reach of betting goes far beyond betting on sporting events and casino play. Hypothetical bets are a useful experimental device in social psychology. Daniel Kahneman popularized this technique in his research, and in his book Thinking, Fast and Slow. One choice he posited went as follows:

Scenario A: You’ve been given $1000. You must now choose between
●A 50% chance to win $1000
●Get $500 for sure

Scenario B: You’ve been given $2000. You must now choose between
●A 50% chance to lose $1000
●Lose $500 for sure

The final states of expected wealth are the same for the choices in the two scenarios. Nonetheless, nearly everyone chooses the sure thing in Scenario A, and the bet in Scenario B. This illustrates the importance of a reference point, the endowment of either $1000 or $2000. From $1000, you can only gain money, and the sure thing of $500 locks in that gain. From $2000 you can only lose money, and the gamble offers a good chance of not losing any money. Reference points are a key element of Kahneman’s and Tversky’s prospect theory. Prospect theory enriches older theories that hypothesized people act rationally to maximize expected utility. Kahneman and Tversky used bets and games of chance, like the above, to illustrate how actual human behavior departs from that predicted by utility theory.

Kahneman’s bets were hypothetical affairs to illustrate academic theories; no actual money changed hands. Political scientists have tried to go further, by creating markets with actual payouts. The goal is to gain information about future political events. These “prediction markets” are thought to embody the net sum of available knowledge on a topic (see The Wisdom of Crowds). Bets in a prediction market serve a function similar to that of bid and offer prices in an economic market. Such political prediction efforts have had limited success, for several reasons. One famous attempt, the U.S. Defense Department’s 2003 Policy Analysis Market, disappeared after it was suggested that future terrorist attacks might be an appropriate topic. There was much criticism that the U.S. government should not be facilitating a market where those with inside knowledge of impending attacks might profit from their knowledge.

Conclusion

Aficionados of betting see in it much potential beyond the casino and racetrack. They are drawn to it as a device for sharpening estimation and decision processes. However, as Kahneman and Tversky illustrated, it runs up against a common aversion to strict probabilistic thinking, the necessary foundation of efficient betting. Indeed, in reading Thinking, Fast and Slow, I found myself wondering how often Kahneman actually got people to take their bets, even hypothetically. People are often free with opinions qualified by modifiers like “probably,” “rarely,” “a lot” or something similar, but moving from there to odds, probabilities and bets is usually a bridge too far.