Apr 28: Statistics in Practice

Models of virus growth are in the news, and this week we take a closer look at the modeling of epidemics, and introduce our newest course: June 12 to July 10  Analyzing and Modeling Covid-19 Data We’ll cover analysis of covid data broadly, and focus…

0 Comments

Conversations with Data Scientists about R and Python

Died-in-the-wool software developers can get quite passionate about the relative virtues of one programming language or another, their debates sometimes threatening to transport you back to middle-school arguments about the greatest ballplayers of all time.  Though their computer passions find other outlets as well, data…

0 Comments

Apr 21: Statistics in Practice

In this week’s Brief we take a look at Python vrs. R, and feature some conversations with data scientists.  Our spotlight is on our introductory statistical programming courses: May 15 - June 12:  Introduction to Python Programming May 15 - June 12:  R Programming Introduction…

0 Comments

Apr 14: Statistics in Practice

In this week’s Brief, we explore what data on the flu can tell us about Covid-19 counter-measures.  Our course spotlight is July 31 - Sept 25:  Biostatistics See you in class! - Peter Bruce Founder, Author, and Senior Scientist Social Distancing and the Flu The…

0 Comments

John Snow

John Snow is popularly regarded as the founder of the field of epidemiology, with his famous study of cholera in London.  Snow plotted cholera cases for a neighborhood served by two wells, and found that nearly all clustered around one of the wells, the Broad…

0 Comments

Apr 7: Statistics in Practice

In this week’s Brief, we look in greater detail at Elder Research, Inc., which recently acquired Statistics.com.  If your organization is like most organizations, your data science initiatives may lack the direction and support they need to succeed - having a data science team does…

0 Comments

Observation and Quote from John Elder, IV

"The hype around Artificial Intelligence, Machine Learning, and Data Science is enormous, so it’s tempting to be skeptical of the return on investment (ROI) claimed. Still, most of the results are real. Organizations may suspect there is value in their data assets but not be…

0 Comments

Elder Research Capabilities

In late December, Statistics.com was acquired by Elder Research, Inc. Many of you have asked for more detail, so here’s an introduction to the folks at Elder Research and some stories of what they do.  There are 100+ employees at Elder Research, led by John…

0 Comments

Coronavirus Death Toll

There are tens of thousands of epidemiologists the world over, and we are beginning to see a bumper crop of forecasts for the ultimate 2020 death toll from Covid-19.  It’s a grim but important forecasting task. Most citizens would support draconian measures to prevent deaths…

0 Comments

Mar 31: Statistics in Practice

In this week’s Brief, we look at p-values.  Plus, we’ve scheduled a couple of extra course sessions for April:  Use the month of April to introduce yourself to Python, or, for those with some Python familiarity, learn how to apply it to predictive analytics. April…

0 Comments

P-Values – Are They Needed?

Five years ago last month, the psychology journal Basic and Applied Social Psychology instigated a major debate in statistical circles when it said it would remove p-value citations from papers it published.  A year later, the American Statistical Association (ASA) released a statement on p-values…

0 Comments

The Depression Gene

The risks of large-scale testing, and the potential for false discovery, can be seen in the “discovery” of the genetic basis for anxiety and depression.  Specifically, serotonin transporter gene 5-HTTLPR. Color Genomics sells a genetic testing product that supposedly can predict which anti-depressant drug works…

0 Comments

Hazard

In biostatistics, hazard, or the hazard rate, is the instantaneous rate of an event (death, failure…).  It is the probability of the event occurring in a (vanishingly) small period of time, divided by the amount of time (mathematically it is the limit of this quantity…

0 Comments

Mar 24: Statistics in Practice

In this week’s Brief, we look again at the statistics of Coronavirus.  We also spotlight our Health Analytics Mastery - a 3-course series in which you can choose from among Biostatistics 1 and 2 Designing Valid Statistical Studies Epidemiologic Statistics * Introduction to Statistical Issues…

0 Comments

Covid-19 Parameters

There are many moving parts in modeling the spread of an epidemic, a subject that has lately attracted the attention of great numbers of statistically-oriented non-epidemiologists (like me).  I’ve put together a “lay statistician’s guide” to some of the important parameters and factors (and I…

0 Comments

Preliminary Paper

Here is a preliminary paper that suggests that RNA extraction kits, one of the main bottlenecks to Covid-19 testing in the US, can be skipped altogether and the next part of the assay (RT-qPCR) still works.  If confirmed, this result would have a major impact…

0 Comments

Mar 18: Statistics in Practice

In this week’s Brief, we look at the coronavirus, and the problem of estimating prevalence and mortality.  Our course spotlight is Nov 8 - Dec 6:  Epidemiologic Statistics (we're adding a spring session - email us to be notified when registration opens at ourcourses@statistics.com) See…

0 Comments

Standardized Death Rate

Often the death rate for a disease is fully known only for a group where the disease has been well studied.  For example, the 3711 passengers on the Diamond Princess cruise ship are, to date, the most fully studied coronavirus population.  All passengers were tested…

0 Comments

Coronavirus: To Test or Not to Test

In recent years, under the influence of statisticians, the medical profession has dialed back on screening tests.  With relatively rare conditions, widespread testing yields many false positives and doctor visits, whose collective cost can outweigh benefits.  Coronavirus advice follows this line - testing is limited…

0 Comments

Mar 16: Statistics in Practice

In this week’s Brief, we look at combining models.  Our course spotlight is April 17 - May 1:  Maximum Likelihood Estimation (MLE) You’ve probably seen lots of references to MLE in other contexts - this quick 2-week course (only $299) is your chance to study…

0 Comments

Regularized Model

In building statistical and machine learning models, regularization is the addition of penalty terms to predictor coefficients to discourage complex models that would otherwise overfit the data.  An example is ridge regression.

0 Comments

Ensemble Learning

In his book, The Wisdom of Crowds, James Surowiecki recounts how Francis Galton, a prominent statistician from the 19th century, attended an event at a country fair in England where the object was to guess the weight of an ox.   Individual contestants were relatively well…

0 Comments

Mar 9: Statistics in Practice

In this week’s Brief, we look at ways to determine optimal sample size.  Our course spotlight is April 10 - May 8:  Sample Size and Power Determination See you in class! - Peter Bruce Founder, Author, and Senior Scientist Big Sample, Unreliable Result The 1948…

0 Comments

Ridge Regression

Ridge regression is a method of penalizing coefficients in a regression model to force a more parsimonious model (one with fewer predictors) than would be produced by an ordinary least squares model. The term “ridge” was applied by Arthur Hoerl in 1970, who saw similarities…

0 Comments

Big Sample, Unreliable Result

Which would you rather have?  A large sample that is biased, or a representative sample that is small?  The American Statistical Association committee that reviewed the 1948 Kinsey report on male sexual behavior, based on interviews with over 5000 men, left no doubt of their…

0 Comments

Mar 2: Statistics in Practice

In this week’s Brief, we look at hierarchical and mixed models.  Our course spotlight is April 10 - May 8:  Generalized Linear Models April 24 - May 22:  Mixed and Hierarchical Linear Models See you in class! - Peter Bruce Founder, Author, and Senior Scientist…

0 Comments

Factor

The term “factor” has different meanings in statistics that can be confusing because they conflict.   In statistical programming languages like R, factor acts as an adjective, used synonymously with categorical - a factor variable is the same thing as a categorical variable.  These factor variables…

0 Comments

Mixed Models – When to Use

Companies now have a lot of data on their customers at an individual level.  Suppose you are tasked with forecasting customer spending at a grocery chain, and you want to understand how customer attributes, local economic factors, and store issues affect customer spending. You could…

0 Comments

Feb 24: Statistics in Practice

In this week’s Brief, we look at social categories, and the role that statistics and data science have played in social engineering - 100 years ago and today.  Our course spotlight is April 3 - May 1:  Categorical Data Analysis See you in class! -…

0 Comments

The Normal Share of Paupers

In 2009, China began regional pilot programs that repurposed credit scores to a broader purpose - scoring a person’s “social credit.”  100 years earlier, at the height of the eugenics craze, the famous statistician Francis Galton undertook to repurpose statistical concepts in service of social…

0 Comments

Purity

In classification, purity measures the extent to which a group of records share the same class.  It is also termed class purity or homogeneity, and sometimes impurity is measured instead.  The measure Gini impurity, for example, is calculated for a two-class case as p(1-p), where…

0 Comments

Predictor P-Values in Predictive Modeling

Not So Useful Predictor p-values in linear models are a guide to the statistical significance of a predictor coefficient value - they measure the probability that a randomly shuffled model could have produced a coefficient as great as the fitted value.  They are of limited…

0 Comments

UpLift and Persuasion

The goal of any direct mail campaign, or other messaging effort, is to persuade somebody to do something.  In the business world, it is usually to buy something. In the political world, it is usually to vote for someone (or, if you think you know…

0 Comments

Feb 17: Statistics in Practice

Last week we looked at several metrics for assessing the performance of classification models - accuracy, receiver operating characteristics (ROC) curves, and lift (gains).  In this week’s Brief we move beyond lift and cover uplift. Our course spotlight again is: Feb 28 - Mar 27:…

0 Comments

ROC, Lift and Gains Curves

There are various metrics for assessing the performance of a classification model.  It matters which one you use. The simplest is accuracy - the proportion of cases correctly classified.  In classification tasks where the outcome of interest (“1”) is rare, though, accuracy as a metric…

0 Comments

Feb 10: Statistics in Practice

Tomorrow is the New Hampshire political primary in the US, and this week’s Brief looks at the statistical concept of lift.  Our spotlight is on: Feb 28 - Mar 27:   Persuasion Analytics and Targeting See you in class! - Peter Bruce, Founder Lift and…

0 Comments

Lift and Persuasion

Predicting the probability that something or someone will belong to a certain category (classification problems) is perhaps the oldest type of problem in analytics.  Consider the category “repays loan.” Equifax, the oldest of the agencies that provides credit scores, was founded in 1899 as the…

0 Comments

Going Beyond the Canary Trap

In 2008, Elon Musk was concerned about leaks of sensitive information at Tesla Motors.  To catch the leaker, he prepared multiple unique versions of a new nondisclosure agreement he asked senior officers to sign.  Whichever version got leaked would reveal the leak source. This is…

0 Comments

Statistics.com Acquired by Elder Research

In last week’s Brief I described how The Institute’s courses, and its Mastery, Certificate and Degree programs would continue without interruption, following our acquisition by Elder Research, Inc.  Now I’d like to talk about how the Institute’s students stand to gain from the expertise and…

0 Comments

Feb 3: Statistics in Practice

In this week’s blog, we discuss our recent acquisition by Elder Research Inc. We also look at the “Canary Trap” and its connection to text mining. Our course spotlight is on Jan 31 to Feb 28: Text Mining using Python (still open for registrations, first…

0 Comments

Choosing the Right Analytics Problem

The “streetlight effect:”  A man is looking for his keys under a streetlight.   Policeman:  “Where did you lose them?”   Man:  “In the alley, near the door to the bar.”   Policeman:  “Why are you looking here?”   Man:  “The light’s better.”   This is related to the more…

0 Comments

Jan 29: Statistics in Practice + Announcement

This week we discuss the importance of choosing the right analytics problem, with a guest blog from Elder Research, Inc., a data science and analytics consulting and training company, with whom we have just joined forces.   Our course spotlight is on: Feb 14 - Mar 13:  Design…

0 Comments

Jan 20: Statistics in Practice

This week’s Brief takes a look at ethical dilemmas in data science.  Our course spotlight is on  Feb 21 - Mar 20:  Network Analysis See you in class! - Peter Bruce, Founder and President The Institute for Statistics Education at Statistics.com Ethical Dilemmas in Data…

0 Comments

Ethical Dilemmas in Data Science

Know those ads that follow you around the web, as a result of tracking cookies?  Many see them as an invasion of privacy, and EU rules made them subject to user consent.  Google recently announced that Chrome will eventually stop supporting these cookies.  A win…

0 Comments

Kernel function

In a standard linear regression, a model is fit to a set of data (the training data); the same linear model applies to all the data.  In local regression methods, multiple models are fit to different neighborhoods of the data. A kernel function is used…

0 Comments

Jan 13: Statistics in Practice

In this Brief, we look at prosaic, but lucrative applications of predictive analytics and forecasting to the automotive industry.  Our spotlight is on our 3-course Predictive Analytics Mastery Series. Start this week with: Jan. 10 - Feb 7:   Predictive Analytics 1 See you in…

0 Comments

Industry Spotlight: Clinical Trials

 “Complete Your Clinical Trial With Our File Data” Clinical trials that support new drug development can cost over a billion dollars.  A new industry has popped up - data collectors and aggregators that provide digital data from their files as evidence in pharmaceutical clinical trials.…

0 Comments

Not Glamorous, But Lucrative

What do stormy days, weekend evenings, and the last day of the month have in common?  They are all good times to negotiate a good price for a new car. Inclement days yield less customer traffic in auto showrooms, which is good for the buyer. …

0 Comments

Jan 6: Statistics in Practice

Happy New Year! We are grateful for your continued support and appreciate your interest in learning more about statistics, analytics, and data science. In this new year, think of your learning as an investment both in the future of your company and your career. Below are courses, certificates, and…

0 Comments

Dec 30: Statistics in Practice

In this Brief, we take a look at the use of simulations as a tool to help sales people with a complex sale (high value, multiple aspects to consider).  Our spotlight is on the 3-course Mastery Series in Optimization Research, which starts January 10 with:…

0 Comments

Simulating the Complex Sale

Every 30 minutes a new business book is published; many of them purport to teach effective selling.  Most of them make sense, but solid quantitative analysis is rarely on the front burner. This is strange, because effective selling requires demonstrating value.  Sales professionals are taught…

0 Comments

Historical Spotlight: Bell Labs and Statistics

95 years ago, Bell Labs was founded as a joint project of AT&T and Western Electric.  Its primary mission was R&D for its parents’ fast-growing telecommunications businesses.  Since that time, Bell Labs became a fabled American research institution, but also suffered the vicissitudes of trying…

0 Comments

Analytics Meets the Cardboard Box

“Do you have a bag?“ or “Would you like a bag?” have become common parts of the brick-and-mortar retail transaction.  Reusable bags, or simply doing without, have reduced the flow of plastic and paper into recycling.   E-commerce is a different matter.  I just unpacked a…

0 Comments

Dec 16: Statistics in Practice

In 2005, the cardboard box was inducted into the National Toy Hall of Fame (along with Candy Land). In our brief this week we consider whether analytics has anything to say about cardboard boxes. Our course spotlight is on: Jan 3 - 31:  R Programming…

0 Comments

Problem of the Week: A betting puzzle

QUESTION: A gambler playing against the “house” in a game like roulette or slots adopts the rule “Play until you win a certain amount, then stop.”  Will this ensure against player losses? What will be its effect on the house’s profit? ANSWER: Some look at this…

0 Comments

Dec 6: Statistics in Practice

This week we look at the casino business - in particular, the odds on slots. In our course spotlight, we start looking at some of the great stuff starting in at the beginning of the new year. In January, you can get started with basic statistics or biostatistics,…

0 Comments

Google Zooms Out on Microtargeting

Google recently announced that it would further limit its election ads to audience targeting based on age, gender, and general location (postal code level) context targeting (i.e. showing ads based on the content being viewed) Up to this point, the application of predictive modeling to…

0 Comments

Betting and Statistics

Betting has had a long and close relationship with the science of probability and statistics.  In the mid-1600’s, the French intellectual and gambler Antoine Gombaud, who called himself Chevalier de Méré, enlisted the help of the mathematician Blaise Pascal to solve several puzzles involving dice…

0 Comments

Operations Research (O/R) For Sewage

Older urban sewer systems are not sealed, dedicated route networks leading to sewage treatment plants.  Rather, to save money when they were built decades ago, in some places they shared pipes with storm water drainage systems that lead to creeks, rivers and bays.  As a…

0 Comments

Nov 25: Statistics in Practice

In this week’s Brief, we take a look at the history of betting and how it is entwined with probabilistic decision-making. Probabilistic decision-making is also the focus of our 3-course Optimization Mastery, which covers linear programming, integer programming, simulation and other operations research (O/R) techniques. Start…

0 Comments

Errors and Loss

Errors - differences between predicted values and actual values, also called residuals - are a key part of statistical models.  They form the raw material for various metrics of predictive model performance (accuracy, precision, recall, lift, etc.), and also the basis for diagnostics on descriptive…

0 Comments

Data Analytics

Terminology in Data Analytics As data continue to grow at a faster rate than either population or economic activity, so do organizations' efforts to deal with the data deluge, and use it to capture value.  And so do the methods used to analyze data, which…

0 Comments

Data Analytics Courses

Data analytics and data science are popular terms, and skills in these areas are in great demand.  But what do these terms mean?  Below is an overview and a listing of related courses. For information about our certificate programs in data science and analytics, click here.…

0 Comments

Statistical Thinking

Gambler’s Fallacy I - forgetting that the “coin has no memory”   Gamblers often believe that after a long streak of one outcome, the probability of a different outcome has increased.  Sports commentators often say that a batter in a slump is “due” for a hit.…

0 Comments

Latin hypercube

In Monte Carlo sampling for simulation problems, random values are generated from a probability distribution deemed appropriate for a given scenario (uniform, poisson, exponential, etc.).  In simple random sampling, each potential random value within the probability distribution has an equal value of being selected. Just…

0 Comments

Oct 14: Statistics in Practice

This week we look at several ways to fool yourself, statistically - variants of the “Gambler’s Fallacy.” Gambling is all about accurately assessing risk, so, naturally, our featured course is: Nov 15 - Dec 13: Risk Simulation and Queuing See you in class! - Peter Bruce,…

0 Comments

Workforce Management

Anyone who has worked in retail knows the anxiety that attends workforce scheduling for both manager and employee.  The manager wonders “Will my employees show up at the right times?” The employee wonders “Will I be scheduled for inconvenient times?  Enough hours? Too many hours?”  …

0 Comments

Regularize

The art of statistics and data science lies, in part, in taking a real-world problem and converting it into a well-defined quantitative problem amenable to useful solution. At the technical end of things lies regularization. In data science this involves various methods of simplifying models,…

0 Comments

Machine Learning and Human Bias

Does better AI offer the hope of prejudice-free decision-making?  Ironically, the reverse might be true, especially with the advent of deep learning.   Bias in hiring is one area where private companies move with great care, since there are thickets of laws and regulations in most…

0 Comments

Oct 7: Statistics in Practice

This week we take a look at how AI encodes human bias, despite our best efforts. Our spotlight this week is on: Nov 8 - Dec 6: Deep Learning See you in class! - Peter Bruce, Chief Academic Officer, Author, Instructor, and Founder The Institute for…

0 Comments

Student Spotlight: Peter Mulready

Peter Mulready is an independent consultant, who worked previously as a system architect at Boehringer Ingelheim, one of the world's largest pharmaceutical companies. Peter got his degree in biology, but his focus shifted to managing and optimizing the use of data in drug discovery research. …

0 Comments

e-cigarettes

Last week, the Trump administration announced a forthcoming ban on e-cigarettes, following news stories of a spate of deaths from vaping.  The Wall Street Journal, on Friday the 13th, published both an editorial and an op-ed piece suggesting that any harm from e-cigarettes is minor…

0 Comments

“Islands in Search of Contents”

“Islands in Search of Continents” is the subtitle of an article by Michael Clarke and Iain Chalmers in the Journal of the American Medical Association (1998; 280: 280-282).  It refers to the fact that many studies are conducted and reported in isolation from other studies on the…

0 Comments

Meta Analysis

1.2 million scientific papers were indexed by PubMed in 2011 (see Are Scientists Doing Too Much Research), ample proof that there are lots of people studying the same or similar things.  For example, there have been Over 100 studies of suicide following psychiatric institutionalization     38 studies…

0 Comments

Industry Spotlight: Health Analytics

Patient Data Management Health analytics is a hot topic now, but to do the analytics you need data - this is where Electronic Health Records (EHR) come in.  An integrated, standardized system for sharing and accessing health data has been “just around the corner” now…

0 Comments

Superusers

“Superusers” of medical services are the small fraction of patients that account for huge consumption of medical services.  An article published August 14, 2019 in JAMA Surgery (online) reports on the application of machine learning methods to Medicare data on 1,049,160 Medicare patients who underwent surgery,…

0 Comments

Job Spotlight: Biostatisticans

Biostatisticians are the shepherds (and the police) that guide the science of developing new therapies for disease.  They come in several different flavors: Those involved in gathering information, designing experiments and analyzing data at the drug discovery stage - trying to sort out what works…

0 Comments

Aug 16: Statistics in Practice

Here in Part 2 of the Weekly Brief, we offer some tools to help you with the question, “what is the optimal set of alternatives to offer consumers?” Our course spotlight is on: Aug 30 - Sep 27: Discrete Choice Modeling and Conjoint Analysis See you in…

0 Comments

Problem of the Week: The Second Heads

QUESTION: A friend tosses two coins, and you ask “Is one of them a heads?”  The friend replies “Yes.” What is the probability that the other is a heads? ANSWER:   One-third.  There are four ways the coins could have landed originally: HH:  0.25 probability…

0 Comments

Aug 13: Statistics in Practice

This week we discuss the distinction between explanatory and predictive modeling and spotlight the workhorses of statistical modeling: Oct 4 - Nov 1: Regression Analysis Oct 4 - Nov 1: Categorical Data Analysis See you in class! - Peter Bruce, Chief Academic Officer, Author, Instructor, and Founder The Institute for Statistics Education at Statistics.com…

0 Comments

Explain or Predict?

A casual user of machine learning methods like CART or naive Bayes is accustomed to evaluating a model by measuring how well it predicts new data.  When examining the output of statistical models, they are often flummoxed by the profusion of assessment metrics. Typical multiple…

0 Comments

Small Ball: Calling all thinkers!

I was visiting New York a couple of weeks ago, transferring from Amtrak to the PATH trains at Newark.  PATH takes you to Wall Street - the #1 financial center in the world - and yet the process of paying for my $2.75 PATH ticket…

0 Comments

Aug 9: Statistics in Practice

We continue Monday's discussion of "people analytics' with a look from the customer's side and a call for all thinkers! (see below) Our course spotlight is on: Sep 6 - Oct 4: Predictive Analytics 1 - Machine Learning Tools Sep 6 - Oct 4: Programming 1…

0 Comments

Industry Spotlight: HR (People Analytics)

Analytics has come to HR.  It’s partly Orwellian, tracking what employees do on the computer, and partly warm and fuzzy, leveraging the true informal organizational structure via network analysis (jump into Friday’s Network Analysis course to learn the basics).  One dimension assumes the worst about…

0 Comments

Aug 5: Statistics in Practice

In this week’s Brief, analytics comes to the HR department (“people analytics”), and our course spotlight is on:  Sep 6 - Oct 4:  Predictive Analytics 1 Sep 6 - Oct 4:  Programming 1 (R or Python)     These courses are excellent entry points into our data…

0 Comments

Aug 2: Statistics in Practice

In part 1 of this week’s brief, we looked at political analytics; in Part 2 we extend that look to commercial domains. Our course spotlight is Persuasion Analytics, taught by Ken Strasma, who pioneered the use of statistical modeling to microtarget voters in the 2004…

0 Comments

Probability

You might be wondering why such a basic word as probability appears here. It turns out that the term has deep tendrils in formal mathematics and philosophy, but is somewhat hard to pin down

0 Comments