Skip to content

Snowball Sampling

Snowball sampling is a form of sampling in which the selection of new sample subjects is suggested by prior subjects.  From a statistical perspective, the method is prone to high variance and bias, compared to random sampling. The characteristics of the initial subject may propagate through the sample to some degree, and a sample derived by starting with subject 1 may differ from that produced by by starting with subject 2, even if the resulting sample in both cases contains both subject 1 and subject 2.  However, …

The Statistics of Christmas Trees

A researcher shakes a sprig from a Christmas tree, and counts the number of needles that fall. He then repeats the process for countless other sprigs. The sprigs are from a variety of species, and the goal is to determine which species do the best job of retaining their needles. Falling needles are a definiteContinue reading “The Statistics of Christmas Trees”

The False Alarm Conundrum

False alarms are one of the most poorly understood problems in applied statistics and biostatistics. The fundamental problem is the wide application of a statistical or diagnostic test in search of something that is relatively rare. Consider the Apple Watch’s new feature that detects atrial fibrillation (afib). Among people with irregular heartbeats, Apple claims aContinue reading “The False Alarm Conundrum”

Conditional Probability Word of the Week

QUESTION:  The rate of residential insurance fraud is 10% (one out of ten claims is fraudulent).  A consultant has proposed a machine learning system to review claims and classify them as fraud or no-fraud.  The system is 90% effective in detecting the fraudulent claims, but only 80% effective in correctly classifying the non-fraud claims (it mistakenly labels one in five as “fraud”).  If the system classifies a claim as fraudulent, what is the probability that it really is fraudulent?

Instructor Spotlight – David Kleinbaum

David Kleinbaum developed several courses for Statistics.com, including Survival Analysis, Epidemiologic Statistics, and Designing Valid Statistical Studies. David retired a little over a year ago from Emory University, where he was a popular and effective teacher with the ability to distill and explain difficult statistical concepts with clarity and concision. David had a flair forContinue reading “Instructor Spotlight – David Kleinbaum”

Book Review: Active-Epi

ActivEpi Web, by David Kleinbaum, is the text used in two Statistics.com courses (Epidemiology Statistics and Designing Valid Studies), but it is really a rich multimedia web-based presentation of epidemiological statistics, serving the role of a unique textbook format for an introductory course in the subject. It is historically noteworthy – it dates back toContinue reading “Book Review: Active-Epi”

Churn

Churn is a term used in marketing to refer to the departure, over time, of customers.  Subscribers to a service may remain for a long time (the ideal customer), or they may leave for a variety of reasons (switching to a competitor, dissatisfaction, credit card expires, customer moves, etc.).  A customer who leaves, for whatever reason, “churns.”

Eli Whitney and Google

This weekend (12/8/2018) marked the 253rd anniversary of the birth of Eli Whitney, inventor of the cotton gin. And 20 years ago, Google received its first big infusions of capital from, among others, Jeff Bezos, the founder of Amazon. Both Eli Whitney and the Google founders instigated economic revolutions, but also illustrate polar opposite approachesContinue reading “Eli Whitney and Google”

How Google Determines Which Ads you See

A classic machine learning task is to predict something’s class, usually binary – pictures as dogs or cats, insurance claims as fraud or not, etc. Often the goal is not a final classification, but an estimate of the probability of belonging to a class (propensity), so the cases can be ranked. A good example ofContinue reading “How Google Determines Which Ads you See”

Job Spotlight: Data Scientist

Data science is one of a host of similar terms. Artificial intelligence has been around since the 1960’s and data mining for at least a couple of decades. Machine learning came out of the computer science community, and analytics, data analytics, and predictive analytics came out of the statistics and OR communities. Among all ofContinue reading “Job Spotlight: Data Scientist”