In this week’s Brief, we look at social categories, and the role that statistics and data science have played in social engineering – 100 years ago and today. Our course spotlight is April 3 – May 1: Categorical Data Analysis See you in class! – Peter Bruce Founder, Author, and Senior Scientist The Normal ShareContinue reading “Feb 24: Statistics in Practice”
Monthly Archives: February 2020
The Normal Share of Paupers
In 2009, China began regional pilot programs that repurposed credit scores to a broader purpose – scoring a person’s “social credit.” 100 years earlier, at the height of the eugenics craze, the famous statistician Francis Galton undertook to repurpose statistical concepts in service of social engineering. The starting point was a social survey of LondonContinue reading “The Normal Share of Paupers”
Purity
In classification, purity measures the extent to which a group of records share the same class. It is also termed class purity or homogeneity, and sometimes impurity is measured instead. The measure Gini impurity, for example, is calculated for a two-class case as p(1-p), where p = the proportion of records belonging to class 1. Continue reading “Purity”
Predictor P-Values in Predictive Modeling
Not So Useful Predictor p-values in linear models are a guide to the statistical significance of a predictor coefficient value – they measure the probability that a randomly shuffled model could have produced a coefficient as great as the fitted value. They are of limited utility in predictive modeling applications for various reasons: Software typicallyContinue reading “Predictor P-Values in Predictive Modeling”
UpLift and Persuasion
The goal of any direct mail campaign, or other messaging effort, is to persuade somebody to do something. In the business world, it is usually to buy something. In the political world, it is usually to vote for someone (or, if you think you know who they will vote for, to encourage them to actuallyContinue reading “UpLift and Persuasion”
Feb 17: Statistics in Practice
Last week we looked at several metrics for assessing the performance of classification models – accuracy, receiver operating characteristics (ROC) curves, and lift (gains). In this week’s Brief we move beyond lift and cover uplift. Our course spotlight again is: Feb 28 – Mar 27: Persuasion Analytics and Targeting See you in class! –Continue reading “Feb 17: Statistics in Practice”
ROC, Lift and Gains Curves
There are various metrics for assessing the performance of a classification model. It matters which one you use. The simplest is accuracy – the proportion of cases correctly classified. In classification tasks where the outcome of interest (“1”) is rare, though, accuracy as a metric falls short – high accuracy can be achieved by classifyingContinue reading “ROC, Lift and Gains Curves”
Feb 10: Statistics in Practice
Tomorrow is the New Hampshire political primary in the US, and this week’s Brief looks at the statistical concept of lift. Our spotlight is on: Feb 28 – Mar 27: Persuasion Analytics and Targeting See you in class! – Peter Bruce, Founder Lift and Persuasion What do you do with late-paying and defaulting customers? Continue reading “Feb 10: Statistics in Practice”
Lift and Persuasion
Predicting the probability that something or someone will belong to a certain category (classification problems) is perhaps the oldest type of problem in analytics. Consider the category “repays loan.” Equifax, the oldest of the agencies that provides credit scores, was founded in 1899 as the Retail Credit Company by two brothers, Cator and Guy Woolford. Continue reading “Lift and Persuasion”
Going Beyond the Canary Trap
In 2008, Elon Musk was concerned about leaks of sensitive information at Tesla Motors. To catch the leaker, he prepared multiple unique versions of a new nondisclosure agreement he asked senior officers to sign. Whichever version got leaked would reveal the leak source. This is known as a “canary trap.” The canary trap only worksContinue reading “Going Beyond the Canary Trap”
Statistics.com Acquired by Elder Research
Feb 3: Statistics in Practice
In this week’s blog, we discuss our recent acquisition by Elder Research Inc. We also look at the “Canary Trap” and its connection to text mining. Our course spotlight is on Jan 31 to Feb 28: Text Mining using Python (still open for registrations, first assignment due in a week) Feb 28 -Mar 27: NaturalContinue reading “Feb 3: Statistics in Practice”