Skip to content

Big Sample, Unreliable Result

Which would you rather have?  A large sample that is biased, or a representative sample that is small?  The American Statistical Association committee that reviewed the 1948 Kinsey report on male sexual behavior, based on interviews with over 5000 men, left no doubt of their preference for the latter.  The statisticians –  William Cochran, FrederickContinue reading “Big Sample, Unreliable Result”

Mixed Models – When to Use

Companies now have a lot of data on their customers at an individual level.  Suppose you are tasked with forecasting customer spending at a grocery chain, and you want to understand how customer attributes, local economic factors, and store issues affect customer spending. You could design your study with hierarchical and mixed linear modeling methodsContinue reading “Mixed Models – When to Use”

The Normal Share of Paupers

In 2009, China began regional pilot programs that repurposed credit scores to a broader purpose – scoring a person’s “social credit.”  100 years earlier, at the height of the eugenics craze, the famous statistician Francis Galton undertook to repurpose statistical concepts in service of social engineering. The starting point was a social survey of LondonContinue reading “The Normal Share of Paupers”

Going Beyond the Canary Trap

In 2008, Elon Musk was concerned about leaks of sensitive information at Tesla Motors.  To catch the leaker, he prepared multiple unique versions of a new nondisclosure agreement he asked senior officers to sign.  Whichever version got leaked would reveal the leak source. This is known as a “canary trap.” The canary trap only worksContinue reading “Going Beyond the Canary Trap”

Book Review: Mining Your Own Business by Gerhard Pilcher and Jeff Deal

This is a short book, Mining Your Own Business: A Primer for Executives on Understanding and Employing Data Mining and  Predictive Analytics” befitting its intended audience – managers and executives with responsibility for data science and analytics projects.  It outlines the requirements for success – not technical model success, but rather successful implementation in a way that buildsContinue reading “Book Review: Mining Your Own Business by Gerhard Pilcher and Jeff Deal”

Choosing the Right Analytics Problem

The “streetlight effect:”  A man is looking for his keys under a streetlight.   Policeman:  “Where did you lose them?”   Man:  “In the alley, near the door to the bar.”   Policeman:  “Why are you looking here?”   Man:  “The light’s better.”   This is related to the more general “Statistical Type 4 Error” – asking the wrong question, andContinue reading “Choosing the Right Analytics Problem”

Historical Spotlight: Statistical Analysis and Human Rights

Artificial intelligence and analytics have gotten some bad press recently, from the role that social media has played in fracturing and heightening divisions in democratic society to the “big brother” role that data mining and image recognition have played in China’s suppression of minorities.  But statistical analysis has also long played a role in documenting,Continue reading “Historical Spotlight: Statistical Analysis and Human Rights”

Simulating the Complex Sale

Every 30 minutes a new business book is published; many of them purport to teach effective selling.  Most of them make sense, but solid quantitative analysis is rarely on the front burner. This is strange, because effective selling requires demonstrating value.  Sales professionals are taught to show components of value such as cost savings orContinue reading “Simulating the Complex Sale”

Analytics Meets the Cardboard Box

“Do you have a bag?“ or “Would you like a bag?” have become common parts of the brick-and-mortar retail transaction.  Reusable bags, or simply doing without, have reduced the flow of plastic and paper into recycling.   E-commerce is a different matter.  I just unpacked a box of wine, and dealing with the protective spacers andContinue reading “Analytics Meets the Cardboard Box”

Detecting a Slots Payout Difference of 2%

Most businesses use statistics and analytics to one degree or another, but there is only one industry that is built solely on this discipline.  This week we look at the casino business – in particular, the odds on slots. Slot machines are a casino’s best friend. Able to run 24/7 with consistently-sized bets, slots realizeContinue reading “Detecting a Slots Payout Difference of 2%”

Book Review: Big Data in Practice by Bernard Marr 

This short book is essentially an enriched list of 45 examples of how companies have used big data analytics.  Marr sticks to high level generalities, and the book is in the spirit of light business journalism rather than detailed expositions that walk you through a successful big data implementation in detail.  However, private companies, andContinue reading “Book Review: Big Data in Practice by Bernard Marr “

Problem of the Week: Missing Data

Question: You have a supervised learning task with 30 predictors, in which 5% of the observations are missing.  The missing data are randomly distributed across variables and records. If your strategy for coping with missing data is to drop records with missing data, what proportion of the records will be dropped?  Is the assumption ofContinue reading “Problem of the Week: Missing Data”

“Money and Brains” and “Furs and Station Wagons”

“Money and Brains” and “Furs and Station Wagons” were evocative customer shorthands that the marketing company Claritas came up with over a half century ago. These names, which facilitated the work of marketers and sales people, were shorthand descriptions of segments of customers identified through statistical cluster analysis. Cluster analysis is also used in marketContinue reading ““Money and Brains” and “Furs and Station Wagons””