Data Literacy – The Chainsaw Case

A famous business school case by Harvard Professor Michael Porter on forecasting chainsaw sales dramatically illustrated the limits of statistical models when common business sense and clear-eyed thinking are missing. In the chainsaw case, students were asked to forecast the future U.S. demand for chainsaws,…

0 Comments

Why Statisticians Like Odds

In your introductory statistics class, probability took center stage. Odds were for gamblers. But it turns out odds play an important role in statistics, too. The relationship between the two is simple. To estimate the probability that event “A” will happen, we divide the number…

0 Comments

Controlling Leaks

Good psychics have a knack for getting their audience to reveal, unwittingly, information that can be turned around and used in a prediction.  Statisticians and data scientists fall prey to a related phenomenon, leakage, when they allow into their models highly predictive features that would…

0 Comments

Why AI Projects Fail: Type III Error

We encountered “Type III error” when it turned out that most people answering our Puzzle question were, in fact, answering a different question from the one that was asked. Type III error is answering the wrong question, and it is a big factor in the…

0 Comments

From Kaggle to Cancel: The Culture of AI

“Extremism in the defense of liberty is no vice. Moderation in the pursuit of justice is no virtue.” So said Barry Goldwater, running for U.S. President in 1964. At the time, the voters rejected his pitch for purity, and his opponent, Lyndon Johnson, won a…

0 Comments

How Much Power Do Voters Have?

The recent U.S. election was one of the most controversial and closest ever, and turnout percentage may be the highest in a century. Still, 37% of the voting age population did not vote. The traditional explanation for why people don’t vote is that they feel…

0 Comments

Gangs and Covid

Dr. Carlos Carcach is Professor & Director of the Center for Public Policy at the Escuela Superior de Economía y Negocios (ESEN) in Santa Tecla, El Salvador, and coordinator of ESEN's post-graduate program in predictive analytics, which offers online instruction in partnership with Statistics.com, using…

0 Comments

Statistics at War

World War 2 gave the statistics profession its big growth spurt. Statistical methods such as correlation, regression, ANOVA, and significance testing were all worked out previously, but it was the war which brought large numbers of people to the field as a profession. They didn’t…

0 Comments

False Positive Rate – It’s Not What You Might Think

“A little knowledge is a dangerous thing,” said Alexander Pope in 1711; he could have been speaking of the use of statistics by experts in all fields. In this article, we look at three consequential mistakes in the field of statistics. Two of them are famous, the third required a deep dive into the corporate annual reports of

0 Comments

Famous Errors in Statistics

“A little knowledge is a dangerous thing,” said Alexander Pope in 1711; he could have been speaking of the use of statistics by experts in all fields. In this article, we look at three consequential mistakes in the field of statistics. Two of them are famous, the third required a deep dive into the corporate annual reports of

0 Comments

The Popular 80%

Researchers and analysts are familiar with the famous 5% benchmark in statistics, the typical probability threshold at which a result becomes statistically significant.  (The probability in question is the probability that a result as interesting as the real-life result will happen in the null model.) …

0 Comments

Four Common Pitfalls in Data Engineering

By Will Goodrum* Note: A version of this article was first published on the Elder Research blog. Your company has made it a strategic priority to become more data-driven. Good! A major anticipated component of this transition is to implement new data technology (e.g., a…

0 Comments

AUC: A Fatally Flawed Model Metric

By John Elder, Founder and Chair of Elder Research, Inc.  Last week, in Recidivism, and the Failure of AUC, we saw how the use of “Area Under the Curve” (AUC) concealed bias against African-Americans defendants in a model predicting recidivism, that is, which defendants would re-offend. …

0 Comments

Recidivism, and the Failure of AUC

On average, 40% - 50% of convicted criminals in the U.S. go on to commit another crime (“recidivate”) after they are released.  For nearly 20 years, court systems have used statistical and machine learning algorithms to predict the probability of recidivism, and to guide sentencing…

0 Comments

Where Outliers are Central

In casual statistical analysis, you sometimes hear references to outliers, along with the suggestion that they should be ignored or dropped from the analysis.  Quite the contrary: often it is the outliers that convey useful information.  They may represent errors in data collection, e.g. a…

0 Comments

Three Myths in Data Science

Myth 1:  It’s All About Prediction “Who cares whether we understand the model - as long as it predicts well!” This was one of the seeming benefits of the era of big data and predictive modeling, and it set data science apart from traditional statistics.  …

0 Comments

Predicting “Do Not Disturbs”

In his book Predictive Analytics, Eric Siegel tells the story of marketing efforts at Telenor, a Norwegian telecom, to reduce churn (customers leaving for another carrier). Sophisticated analytics were used to guide the campaigns, but the managers gradually discovered that some campaigns were backfiring:  they…

0 Comments

Ethical Data Science

Guest Blog - Grant Fleming, Data Scientist, Elder Research Progress in data science is largely driven by the ever-improving predictive performance of increasingly complex black-box models. However, these predictive gains have come at the expense of losing the ability to interpret the relationships derived between…

0 Comments

Statistical Arbitrage

An economics professor and an engineering professor were walking across campus.  The engineering professor spots something lying in the grass - “Look- here’s a $20 bill!”  The economist doesn’t bother to look.  “It can’t be - somebody would have picked it up.” This old joke…

0 Comments

Evolutionary Algorithms

It was 150 years ago when Darwin first used the term “evolution” in his writing (in his book The Descent of Man).  Two months ago, in The Normal Share of Paupers, I briefly discussed the unfortunate eugenics baggage that the discipline of statistics inherited from…

0 Comments

Conversations with Data Scientists about R and Python

Died-in-the-wool software developers can get quite passionate about the relative virtues of one programming language or another, their debates sometimes threatening to transport you back to middle-school arguments about the greatest ballplayers of all time.  Though their computer passions find other outlets as well, data…

0 Comments

Elder Research Capabilities

In late December, Statistics.com was acquired by Elder Research, Inc. Many of you have asked for more detail, so here’s an introduction to the folks at Elder Research and some stories of what they do.  There are 100+ employees at Elder Research, led by John…

0 Comments

Coronavirus Death Toll

There are tens of thousands of epidemiologists the world over, and we are beginning to see a bumper crop of forecasts for the ultimate 2020 death toll from Covid-19.  It’s a grim but important forecasting task. Most citizens would support draconian measures to prevent deaths…

0 Comments

P-Values – Are They Needed?

Five years ago last month, the psychology journal Basic and Applied Social Psychology instigated a major debate in statistical circles when it said it would remove p-value citations from papers it published.  A year later, the American Statistical Association (ASA) released a statement on p-values…

0 Comments

Covid-19 Parameters

There are many moving parts in modeling the spread of an epidemic, a subject that has lately attracted the attention of great numbers of statistically-oriented non-epidemiologists (like me).  I’ve put together a “lay statistician’s guide” to some of the important parameters and factors (and I…

0 Comments

Ensemble Learning

In his book, The Wisdom of Crowds, James Surowiecki recounts how Francis Galton, a prominent statistician from the 19th century, attended an event at a country fair in England where the object was to guess the weight of an ox.   Individual contestants were relatively well…

0 Comments

Big Sample, Unreliable Result

Which would you rather have?  A large sample that is biased, or a representative sample that is small?  The American Statistical Association committee that reviewed the 1948 Kinsey report on male sexual behavior, based on interviews with over 5000 men, left no doubt of their…

0 Comments

Mixed Models – When to Use

Companies now have a lot of data on their customers at an individual level.  Suppose you are tasked with forecasting customer spending at a grocery chain, and you want to understand how customer attributes, local economic factors, and store issues affect customer spending. You could…

0 Comments

The Normal Share of Paupers

In 2009, China began regional pilot programs that repurposed credit scores to a broader purpose - scoring a person’s “social credit.”  100 years earlier, at the height of the eugenics craze, the famous statistician Francis Galton undertook to repurpose statistical concepts in service of social…

0 Comments

UpLift and Persuasion

The goal of any direct mail campaign, or other messaging effort, is to persuade somebody to do something.  In the business world, it is usually to buy something. In the political world, it is usually to vote for someone (or, if you think you know…

0 Comments

Lift and Persuasion

Predicting the probability that something or someone will belong to a certain category (classification problems) is perhaps the oldest type of problem in analytics.  Consider the category “repays loan.” Equifax, the oldest of the agencies that provides credit scores, was founded in 1899 as the…

0 Comments

Going Beyond the Canary Trap

In 2008, Elon Musk was concerned about leaks of sensitive information at Tesla Motors.  To catch the leaker, he prepared multiple unique versions of a new nondisclosure agreement he asked senior officers to sign.  Whichever version got leaked would reveal the leak source. This is…

0 Comments

Statistics.com Acquired by Elder Research

In last week’s Brief I described how The Institute’s courses, and its Mastery, Certificate and Degree programs would continue without interruption, following our acquisition by Elder Research, Inc.  Now I’d like to talk about how the Institute’s students stand to gain from the expertise and…

0 Comments

Choosing the Right Analytics Problem

The “streetlight effect:”  A man is looking for his keys under a streetlight.   Policeman:  “Where did you lose them?”   Man:  “In the alley, near the door to the bar.”   Policeman:  “Why are you looking here?”   Man:  “The light’s better.”   This is related to the more…

0 Comments

Ethical Dilemmas in Data Science

Know those ads that follow you around the web, as a result of tracking cookies?  Many see them as an invasion of privacy, and EU rules made them subject to user consent.  Google recently announced that Chrome will eventually stop supporting these cookies.  A win…

0 Comments

Not Glamorous, But Lucrative

What do stormy days, weekend evenings, and the last day of the month have in common?  They are all good times to negotiate a good price for a new car. Inclement days yield less customer traffic in auto showrooms, which is good for the buyer. …

0 Comments

Simulating the Complex Sale

Every 30 minutes a new business book is published; many of them purport to teach effective selling.  Most of them make sense, but solid quantitative analysis is rarely on the front burner. This is strange, because effective selling requires demonstrating value.  Sales professionals are taught…

0 Comments

Analytics Meets the Cardboard Box

“Do you have a bag?“ or “Would you like a bag?” have become common parts of the brick-and-mortar retail transaction.  Reusable bags, or simply doing without, have reduced the flow of plastic and paper into recycling.   E-commerce is a different matter.  I just unpacked a…

0 Comments

Betting and Statistics

Betting has had a long and close relationship with the science of probability and statistics.  In the mid-1600’s, the French intellectual and gambler Antoine Gombaud, who called himself Chevalier de Méré, enlisted the help of the mathematician Blaise Pascal to solve several puzzles involving dice…

0 Comments

Data Analytics

Terminology in Data Analytics As data continue to grow at a faster rate than either population or economic activity, so do organizations' efforts to deal with the data deluge, and use it to capture value.  And so do the methods used to analyze data, which…

0 Comments

Data Analytics Courses

Data analytics and data science are popular terms, and skills in these areas are in great demand.  But what do these terms mean?  Below is an overview and a listing of related courses. For information about our certificate programs in data science and analytics, click here.…

0 Comments

Statistical Thinking

Gambler’s Fallacy I - forgetting that the “coin has no memory”   Gamblers often believe that after a long streak of one outcome, the probability of a different outcome has increased.  Sports commentators often say that a batter in a slump is “due” for a hit.…

0 Comments

Machine Learning and Human Bias

Does better AI offer the hope of prejudice-free decision-making?  Ironically, the reverse might be true, especially with the advent of deep learning.   Bias in hiring is one area where private companies move with great care, since there are thickets of laws and regulations in most…

0 Comments

Meta Analysis

1.2 million scientific papers were indexed by PubMed in 2011 (see Are Scientists Doing Too Much Research), ample proof that there are lots of people studying the same or similar things.  For example, there have been Over 100 studies of suicide following psychiatric institutionalization     38 studies…

0 Comments

Explain or Predict?

A casual user of machine learning methods like CART or naive Bayes is accustomed to evaluating a model by measuring how well it predicts new data.  When examining the output of statistical models, they are often flummoxed by the profusion of assessment metrics. Typical multiple…

0 Comments

Matching Algorithms

Some applications of machine learning and artificial intelligence are recognizably impressive - predicting future hospital readmission of discharged patients, for example, or diagnosing retinopathy. Others - self-driving cars, for example - seem almost magical. The matching problem, though, is one where your first reaction might…

0 Comments

Work and Heat

If you are working on New Year's Eve or New Year's Day, odds are it is from home, where you can (usually) control the temperature in the home. Which, from the standpoint of productivity, is a good thing. According to a study from Cornell, raising…

0 Comments

Quotes about Data Science

“The goal is to turn data into information, and information into insight.” – Carly Fiorina, former CEO, Hewlett-Packard Co. Speech given at Oracle OpenWorld “Data is the new science. Big data holds the answers.” – Pat Gelsinger, CEO, EMC, Big Bets on Big Data, Forbes“Hiding within those…

0 Comments

College Credit Recommendation

Statistics.com Receives College Recommendation from the American Council on Education (ACE) College Credit Recommendation for Online Data Science Courses from The Institute for Statistics Education at Statistics.com LLC The American Council on Education's College Credit Recommendation Service (ACE CREDIT) has evaluated and recommended college credit…

0 Comments

Needle in a Haystack

What's the probability that the NSA examined the metadata for your phone number in 2013? According to John Inglis, Deputy Director at the NSA, it's about 0.00001, or 1 in 100,000. A surprisingly small number, given what we've all been reading in the media about…

0 Comments

Personality regions

There are Red States and Blue States. The three blue states of the Pacific coast constitute the Left Coast. For Colin Woodward, Yankeedom comprises both New England and the Great Lakes. If you're into accessories, there's the Bible Belt, the Rust Belt, and the Stroke…

0 Comments