Apr 2: Statistics in Practice – Special Epi Course

In this special Brief we step back and look at various estimates of the projected death toll from the coronavirus.   Would you like to learn more about the statistical analysis of disease?  We’re offering a special self-paced course to those seeking to improve their knowledge…

Comments Off on Apr 2: Statistics in Practice – Special Epi Course

Mar 31: Statistics in Practice

In this week’s Brief, we look at p-values.  Plus, we’ve scheduled a couple of extra course sessions for April:  Use the month of April to introduce yourself to Python, or, for those with some Python familiarity, learn how to apply it to predictive analytics. April…

Comments Off on Mar 31: Statistics in Practice

P-Values – Are They Needed?

Five years ago last month, the psychology journal Basic and Applied Social Psychology instigated a major debate in statistical circles when it said it would remove p-value citations from papers it published.  A year later, the American Statistical Association (ASA) released a statement on p-values…

Comments Off on P-Values – Are They Needed?

The Depression Gene

The risks of large-scale testing, and the potential for false discovery, can be seen in the “discovery” of the genetic basis for anxiety and depression.  Specifically, serotonin transporter gene 5-HTTLPR. Color Genomics sells a genetic testing product that supposedly can predict which anti-depressant drug works…

Comments Off on The Depression Gene

Hazard

In biostatistics, hazard, or the hazard rate, is the instantaneous rate of an event (death, failure…).  It is the probability of the event occurring in a (vanishingly) small period of time, divided by the amount of time (mathematically it is the limit of this quantity…

Comments Off on Hazard

Mar 24: Statistics in Practice

In this week’s Brief, we look again at the statistics of Coronavirus.  We also spotlight our Health Analytics Mastery - a 3-course series in which you can choose from among Biostatistics 1 and 2 Designing Valid Statistical Studies Epidemiologic Statistics * Introduction to Statistical Issues…

Comments Off on Mar 24: Statistics in Practice

Covid-19 Parameters

There are many moving parts in modeling the spread of an epidemic, a subject that has lately attracted the attention of great numbers of statistically-oriented non-epidemiologists (like me).  I’ve put together a “lay statistician’s guide” to some of the important parameters and factors (and I…

Comments Off on Covid-19 Parameters

Preliminary Paper

Here is a preliminary paper that suggests that RNA extraction kits, one of the main bottlenecks to Covid-19 testing in the US, can be skipped altogether and the next part of the assay (RT-qPCR) still works.  If confirmed, this result would have a major impact…

Comments Off on Preliminary Paper

Mar 18: Statistics in Practice

In this week’s Brief, we look at the coronavirus, and the problem of estimating prevalence and mortality.  Our course spotlight is Nov 8 - Dec 6:  Epidemiologic Statistics (we're adding a spring session - email us to be notified when registration opens at ourcourses@statistics.com) See…

Comments Off on Mar 18: Statistics in Practice

Standardized Death Rate

Often the death rate for a disease is fully known only for a group where the disease has been well studied.  For example, the 3711 passengers on the Diamond Princess cruise ship are, to date, the most fully studied coronavirus population.  All passengers were tested…

Comments Off on Standardized Death Rate

Coronavirus – in Search of the Elusive Denominator

Anyone with internet access these days has their eyes on two constellations of data - the spread of the coronavirus, and the resulting collapse of the financial markets.  Following the 13% one-day drop of the stock market a week ago, The Wall Street Journal forecast…

Comments Off on Coronavirus – in Search of the Elusive Denominator

Coronavirus: To Test or Not to Test

In recent years, under the influence of statisticians, the medical profession has dialed back on screening tests.  With relatively rare conditions, widespread testing yields many false positives and doctor visits, whose collective cost can outweigh benefits.  Coronavirus advice follows this line - testing is limited…

Comments Off on Coronavirus: To Test or Not to Test

Mar 16: Statistics in Practice

In this week’s Brief, we look at combining models.  Our course spotlight is April 17 - May 1:  Maximum Likelihood Estimation (MLE) You’ve probably seen lots of references to MLE in other contexts - this quick 2-week course (only $299) is your chance to study…

Comments Off on Mar 16: Statistics in Practice

Regularized Model

In building statistical and machine learning models, regularization is the addition of penalty terms to predictor coefficients to discourage complex models that would otherwise overfit the data.  An example is ridge regression.

Comments Off on Regularized Model

Ensemble Learning

In his book, The Wisdom of Crowds, James Surowiecki recounts how Francis Galton, a prominent statistician from the 19th century, attended an event at a country fair in England where the object was to guess the weight of an ox.   Individual contestants were relatively well…

Comments Off on Ensemble Learning

Mar 9: Statistics in Practice

In this week’s Brief, we look at ways to determine optimal sample size.  Our course spotlight is April 10 - May 8:  Sample Size and Power Determination See you in class! - Peter Bruce Founder, Author, and Senior Scientist Big Sample, Unreliable Result The 1948…

Comments Off on Mar 9: Statistics in Practice

Ridge Regression

Ridge regression is a method of penalizing coefficients in a regression model to force a more parsimonious model (one with fewer predictors) than would be produced by an ordinary least squares model. The term “ridge” was applied by Arthur Hoerl in 1970, who saw similarities…

Comments Off on Ridge Regression

Big Sample, Unreliable Result

Which would you rather have?  A large sample that is biased, or a representative sample that is small?  The American Statistical Association committee that reviewed the 1948 Kinsey report on male sexual behavior, based on interviews with over 5000 men, left no doubt of their…

Comments Off on Big Sample, Unreliable Result

Mar 2: Statistics in Practice

In this week’s Brief, we look at hierarchical and mixed models.  Our course spotlight is April 10 - May 8:  Generalized Linear Models April 24 - May 22:  Mixed and Hierarchical Linear Models See you in class! - Peter Bruce Founder, Author, and Senior Scientist…

Comments Off on Mar 2: Statistics in Practice

Problem of the Week: Notify or Don’t Notify?

Our problem of the week is an ethical dilemma, posed by the New England Journal of Medicine to its readers 10 days ago.  Volunteers contributed DNA samples to investigators building a genetic database for study, on condition the data would be deidentified and kept confidential…

Comments Off on Problem of the Week: Notify or Don’t Notify?

Factor

The term “factor” has different meanings in statistics that can be confusing because they conflict.   In statistical programming languages like R, factor acts as an adjective, used synonymously with categorical - a factor variable is the same thing as a categorical variable.  These factor variables…

Comments Off on Factor

Mixed Models – When to Use

Companies now have a lot of data on their customers at an individual level.  Suppose you are tasked with forecasting customer spending at a grocery chain, and you want to understand how customer attributes, local economic factors, and store issues affect customer spending. You could…

Comments Off on Mixed Models – When to Use

Feb 24: Statistics in Practice

In this week’s Brief, we look at social categories, and the role that statistics and data science have played in social engineering - 100 years ago and today.  Our course spotlight is April 3 - May 1:  Categorical Data Analysis See you in class! -…

Comments Off on Feb 24: Statistics in Practice

The Normal Share of Paupers

In 2009, China began regional pilot programs that repurposed credit scores to a broader purpose - scoring a person’s “social credit.”  100 years earlier, at the height of the eugenics craze, the famous statistician Francis Galton undertook to repurpose statistical concepts in service of social…

Comments Off on The Normal Share of Paupers

Purity

In classification, purity measures the extent to which a group of records share the same class.  It is also termed class purity or homogeneity, and sometimes impurity is measured instead.  The measure Gini impurity, for example, is calculated for a two-class case as p(1-p), where…

Comments Off on Purity

Predictor P-Values in Predictive Modeling

Not So Useful Predictor p-values in linear models are a guide to the statistical significance of a predictor coefficient value - they measure the probability that a randomly shuffled model could have produced a coefficient as great as the fitted value.  They are of limited…

Comments Off on Predictor P-Values in Predictive Modeling

UpLift and Persuasion

The goal of any direct mail campaign, or other messaging effort, is to persuade somebody to do something.  In the business world, it is usually to buy something. In the political world, it is usually to vote for someone (or, if you think you know…

Comments Off on UpLift and Persuasion

Feb 17: Statistics in Practice

Last week we looked at several metrics for assessing the performance of classification models - accuracy, receiver operating characteristics (ROC) curves, and lift (gains).  In this week’s Brief we move beyond lift and cover uplift. Our course spotlight again is: Feb 28 - Mar 27:…

Comments Off on Feb 17: Statistics in Practice

ROC, Lift and Gains Curves

There are various metrics for assessing the performance of a classification model.  It matters which one you use. The simplest is accuracy - the proportion of cases correctly classified.  In classification tasks where the outcome of interest (“1”) is rare, though, accuracy as a metric…

Comments Off on ROC, Lift and Gains Curves

Feb 10: Statistics in Practice

Tomorrow is the New Hampshire political primary in the US, and this week’s Brief looks at the statistical concept of lift.  Our spotlight is on: Feb 28 - Mar 27:   Persuasion Analytics and Targeting See you in class! - Peter Bruce, Founder Lift and…

Comments Off on Feb 10: Statistics in Practice

Lift and Persuasion

Predicting the probability that something or someone will belong to a certain category (classification problems) is perhaps the oldest type of problem in analytics.  Consider the category “repays loan.” Equifax, the oldest of the agencies that provides credit scores, was founded in 1899 as the…

Comments Off on Lift and Persuasion

Going Beyond the Canary Trap

In 2008, Elon Musk was concerned about leaks of sensitive information at Tesla Motors.  To catch the leaker, he prepared multiple unique versions of a new nondisclosure agreement he asked senior officers to sign.  Whichever version got leaked would reveal the leak source. This is…

Comments Off on Going Beyond the Canary Trap

Statistics.com Acquired by Elder Research

In last week’s Brief I described how The Institute’s courses, and its Mastery, Certificate and Degree programs would continue without interruption, following our acquisition by Elder Research, Inc.  Now I’d like to talk about how the Institute’s students stand to gain from the expertise and…

Comments Off on Statistics.com Acquired by Elder Research

Feb 3: Statistics in Practice

In this week’s blog, we discuss our recent acquisition by Elder Research Inc. We also look at the “Canary Trap” and its connection to text mining. Our course spotlight is on Jan 31 to Feb 28: Text Mining using Python (still open for registrations, first…

Comments Off on Feb 3: Statistics in Practice

PRESS RELEASE: STATISTICS.COM ACQUIRED BY ELDER RESEARCH

Statistics.com Acquired by Elder Research Acquisition Will Provide Focused Corporate and Individual Analytics Training Arlington, VA, January 29, 2020 - The Institute for Statistics Education at Statistics.com is excited to announce that it has been acquired by Elder Research, Inc, a Machine Learning, Data Science,…

Comments Off on PRESS RELEASE: STATISTICS.COM ACQUIRED BY ELDER RESEARCH

Choosing the Right Analytics Problem

The “streetlight effect:”  A man is looking for his keys under a streetlight.   Policeman:  “Where did you lose them?”   Man:  “In the alley, near the door to the bar.”   Policeman:  “Why are you looking here?”   Man:  “The light’s better.”   This is related to the more…

Comments Off on Choosing the Right Analytics Problem

Book Review: Mining Your Own Business by Gerhard Pilcher and Jeff Deal

This is a short book, Mining Your Own Business: A Primer for Executives on Understanding and Employing Data Mining and  Predictive Analytics" befitting its intended audience - managers and executives with responsibility for data science and analytics projects.  It outlines the requirements for success - not technical model…

Comments Off on Book Review: Mining Your Own Business by Gerhard Pilcher and Jeff Deal

Jan 29: Statistics in Practice + Announcement

This week we discuss the importance of choosing the right analytics problem, with a guest blog from Elder Research, Inc., a data science and analytics consulting and training company, with whom we have just joined forces.   Our course spotlight is on: Feb 14 - Mar 13:  Design…

Comments Off on Jan 29: Statistics in Practice + Announcement

Jan 20: Statistics in Practice

This week’s Brief takes a look at ethical dilemmas in data science.  Our course spotlight is on  Feb 21 - Mar 20:  Network Analysis See you in class! - Peter Bruce, Founder and President The Institute for Statistics Education at Statistics.com Ethical Dilemmas in Data…

Comments Off on Jan 20: Statistics in Practice

Ethical Dilemmas in Data Science

Know those ads that follow you around the web, as a result of tracking cookies?  Many see them as an invasion of privacy, and EU rules made them subject to user consent.  Google recently announced that Chrome will eventually stop supporting these cookies.  A win…

Comments Off on Ethical Dilemmas in Data Science

Kernel function

In a standard linear regression, a model is fit to a set of data (the training data); the same linear model applies to all the data.  In local regression methods, multiple models are fit to different neighborhoods of the data. A kernel function is used…

Comments Off on Kernel function

Jan 13: Statistics in Practice

        In this Brief, we look at prosaic, but lucrative applications of predictive analytics and forecasting to the automotive industry.  Our spotlight is on our 3-course Predictive Analytics Mastery Series. Start this week with: Jan. 10 - Feb 7:   Predictive Analytics…

Comments Off on Jan 13: Statistics in Practice

Industry Spotlight: Clinical Trials

 “Complete Your Clinical Trial With Our File Data” Clinical trials that support new drug development can cost over a billion dollars.  A new industry has popped up - data collectors and aggregators that provide digital data from their files as evidence in pharmaceutical clinical trials.…

Comments Off on Industry Spotlight: Clinical Trials

Not Glamorous, But Lucrative

What do stormy days, weekend evenings, and the last day of the month have in common?  They are all good times to negotiate a good price for a new car. Inclement days yield less customer traffic in auto showrooms, which is good for the buyer. …

Comments Off on Not Glamorous, But Lucrative

Jan 6: Statistics in Practice

Happy New Year! We are grateful for your continued support and appreciate your interest in learning more about statistics, analytics, and data science. In this new year, think of your learning as an investment both in the future of your company and your career. Below are courses, certificates, and…

Comments Off on Jan 6: Statistics in Practice

Dec 30: Statistics in Practice

In this Brief, we take a look at the use of simulations as a tool to help sales people with a complex sale (high value, multiple aspects to consider).  Our spotlight is on the 3-course Mastery Series in Optimization Research, which starts January 10 with:…

Comments Off on Dec 30: Statistics in Practice

Historical Spotlight: Statistical Analysis and Human Rights

Artificial intelligence and analytics have gotten some bad press recently, from the role that social media has played in fracturing and heightening divisions in democratic society to the “big brother” role that data mining and image recognition have played in China’s suppression of minorities.  But…

Comments Off on Historical Spotlight: Statistical Analysis and Human Rights

Simulating the Complex Sale

Every 30 minutes a new business book is published; many of them purport to teach effective selling.  Most of them make sense, but solid quantitative analysis is rarely on the front burner. This is strange, because effective selling requires demonstrating value.  Sales professionals are taught…

Comments Off on Simulating the Complex Sale

Historical Spotlight: Bell Labs and Statistics

95 years ago, Bell Labs was founded as a joint project of AT&T and Western Electric.  Its primary mission was R&D for its parents’ fast-growing telecommunications businesses.  Since that time, Bell Labs became a fabled American research institution, but also suffered the vicissitudes of trying…

Comments Off on Historical Spotlight: Bell Labs and Statistics

Analytics Meets the Cardboard Box

“Do you have a bag?“ or “Would you like a bag?” have become common parts of the brick-and-mortar retail transaction.  Reusable bags, or simply doing without, have reduced the flow of plastic and paper into recycling.   E-commerce is a different matter.  I just unpacked a…

Comments Off on Analytics Meets the Cardboard Box

Dec 16: Statistics in Practice

In 2005, the cardboard box was inducted into the National Toy Hall of Fame (along with Candy Land). In our brief this week we consider whether analytics has anything to say about cardboard boxes. Our course spotlight is on: Jan 3 - 31:  R Programming…

Comments Off on Dec 16: Statistics in Practice

Detecting a Slots Payout Difference of 2%

Most businesses use statistics and analytics to one degree or another, but there is only one industry that is built solely on this discipline.  This week we look at the casino business - in particular, the odds on slots. Slot machines are a casino’s best…

Comments Off on Detecting a Slots Payout Difference of 2%

Problem of the Week: A betting puzzle

QUESTION: A gambler playing against the “house” in a game like roulette or slots adopts the rule “Play until you win a certain amount, then stop.”  Will this ensure against player losses? What will be its effect on the house’s profit? ANSWER: Some look at this…

Comments Off on Problem of the Week: A betting puzzle

Book Review: Big Data in Practice by Bernard Marr 

This short book is essentially an enriched list of 45 examples of how companies have used big data analytics.  Marr sticks to high level generalities, and the book is in the spirit of light business journalism rather than detailed expositions that walk you through a…

Comments Off on Book Review: Big Data in Practice by Bernard Marr 

Dec 6: Statistics in Practice

This week we look at the casino business - in particular, the odds on slots. In our course spotlight, we start looking at some of the great stuff starting in at the beginning of the new year. In January, you can get started with basic statistics or biostatistics,…

Comments Off on Dec 6: Statistics in Practice

Google Zooms Out on Microtargeting

Google recently announced that it would further limit its election ads to audience targeting based on age, gender, and general location (postal code level) context targeting (i.e. showing ads based on the content being viewed) Up to this point, the application of predictive modeling to…

Comments Off on Google Zooms Out on Microtargeting

Betting and Statistics

Betting has had a long and close relationship with the science of probability and statistics.  In the mid-1600’s, the French intellectual and gambler Antoine Gombaud, who called himself Chevalier de Méré, enlisted the help of the mathematician Blaise Pascal to solve several puzzles involving dice…

Comments Off on Betting and Statistics

Operations Research (O/R) For Sewage

Older urban sewer systems are not sealed, dedicated route networks leading to sewage treatment plants.  Rather, to save money when they were built decades ago, in some places they shared pipes with storm water drainage systems that lead to creeks, rivers and bays.  As a…

Comments Off on Operations Research (O/R) For Sewage

Nov 25: Statistics in Practice

In this week’s Brief, we take a look at the history of betting and how it is entwined with probabilistic decision-making. Probabilistic decision-making is also the focus of our 3-course Optimization Mastery, which covers linear programming, integer programming, simulation and other operations research (O/R) techniques. Start…

Comments Off on Nov 25: Statistics in Practice

Errors and Loss

Errors - differences between predicted values and actual values, also called residuals - are a key part of statistical models.  They form the raw material for various metrics of predictive model performance (accuracy, precision, recall, lift, etc.), and also the basis for diagnostics on descriptive…

Comments Off on Errors and Loss

Unforeseen Consequences in Data Science

Unforeseen Consequences in Data Science After the massive Exxon Valdez oil spill, states passed laws boosting the liability of tanker companies for future spills.  The result was not as intended: fly-by-night companies, whose bankruptcy would not be consequential, took over the trade. In this blog…

Comments Off on Unforeseen Consequences in Data Science

Data Analytics

Terminology in Data Analytics As data continue to grow at a faster rate than either population or economic activity, so do organizations' efforts to deal with the data deluge, and use it to capture value.  And so do the methods used to analyze data, which…

Comments Off on Data Analytics

Data Analytics Courses

Data analytics and data science are popular terms, and skills in these areas are in great demand.  But what do these terms mean?  Below is an overview and a listing of related courses. For information about our certificate programs in data science and analytics, click here.…

Comments Off on Data Analytics Courses

Statistical Thinking

Gambler’s Fallacy I - forgetting that the “coin has no memory”   Gamblers often believe that after a long streak of one outcome, the probability of a different outcome has increased.  Sports commentators often say that a batter in a slump is “due” for a hit.…

Comments Off on Statistical Thinking

Latin hypercube

In Monte Carlo sampling for simulation problems, random values are generated from a probability distribution deemed appropriate for a given scenario (uniform, poisson, exponential, etc.).  In simple random sampling, each potential random value within the probability distribution has an equal value of being selected. Just…

Comments Off on Latin hypercube

Oct 14: Statistics in Practice

This week we look at several ways to fool yourself, statistically - variants of the “Gambler’s Fallacy.” Gambling is all about accurately assessing risk, so, naturally, our featured course is: Nov 15 - Dec 13: Risk Simulation and Queuing See you in class! - Peter Bruce,…

Comments Off on Oct 14: Statistics in Practice

Workforce Management

Anyone who has worked in retail knows the anxiety that attends workforce scheduling for both manager and employee.  The manager wonders “Will my employees show up at the right times?” The employee wonders “Will I be scheduled for inconvenient times?  Enough hours? Too many hours?”  …

Comments Off on Workforce Management

Regularize

The art of statistics and data science lies, in part, in taking a real-world problem and converting it into a well-defined quantitative problem amenable to useful solution. At the technical end of things lies regularization. In data science this involves various methods of simplifying models,…

Comments Off on Regularize

Machine Learning and Human Bias

Does better AI offer the hope of prejudice-free decision-making?  Ironically, the reverse might be true, especially with the advent of deep learning.   Bias in hiring is one area where private companies move with great care, since there are thickets of laws and regulations in most…

Comments Off on Machine Learning and Human Bias

Oct 7: Statistics in Practice

This week we take a look at how AI encodes human bias, despite our best efforts. Our spotlight this week is on: Nov 8 - Dec 6: Deep Learning See you in class! - Peter Bruce, Chief Academic Officer, Author, Instructor, and Founder The Institute for…

Comments Off on Oct 7: Statistics in Practice

Student Spotlight: Peter Mulready

Peter Mulready is an independent consultant, who worked previously as a system architect at Boehringer Ingelheim, one of the world's largest pharmaceutical companies. Peter got his degree in biology, but his focus shifted to managing and optimizing the use of data in drug discovery research. …

Comments Off on Student Spotlight: Peter Mulready

Anomaly Detection via Conversation: “How was your vacation?”

A friendly query about your holiday might be a question you get from a roaming agent in the check-in area at the Tel Aviv airport.  Israel, considered to have the most effective airport security in the world, does not rely solely on routine mechanical screening…

Comments Off on Anomaly Detection via Conversation: “How was your vacation?”

e-cigarettes

Last week, the Trump administration announced a forthcoming ban on e-cigarettes, following news stories of a spate of deaths from vaping.  The Wall Street Journal, on Friday the 13th, published both an editorial and an op-ed piece suggesting that any harm from e-cigarettes is minor…

Comments Off on e-cigarettes

Book Review: Bandit Algorithms for Website Optimization, by John Myles White

Bandit Algorithms for Website Optimization, by John Myles White A classic statistical experimental design comparing treatments (two treatments, treatment versus control, multiple treatments) specifies a sample size, collection of data, then a decision, typically based on hypothesis-testing:  the winning treatment must attain a level of…

Comments Off on Book Review: Bandit Algorithms for Website Optimization, by John Myles White

“Islands in Search of Contents”

“Islands in Search of Continents” is the subtitle of an article by Michael Clarke and Iain Chalmers in the Journal of the American Medical Association (1998; 280: 280-282).  It refers to the fact that many studies are conducted and reported in isolation from other studies on the…

Comments Off on “Islands in Search of Contents”

Meta Analysis

1.2 million scientific papers were indexed by PubMed in 2011 (see Are Scientists Doing Too Much Research), ample proof that there are lots of people studying the same or similar things.  For example, there have been Over 100 studies of suicide following psychiatric institutionalization     38 studies…

Comments Off on Meta Analysis

Industry Spotlight: Health Analytics

Patient Data Management Health analytics is a hot topic now, but to do the analytics you need data - this is where Electronic Health Records (EHR) come in.  An integrated, standardized system for sharing and accessing health data has been “just around the corner” now…

Comments Off on Industry Spotlight: Health Analytics

Superusers

“Superusers” of medical services are the small fraction of patients that account for huge consumption of medical services.  An article published August 14, 2019 in JAMA Surgery (online) reports on the application of machine learning methods to Medicare data on 1,049,160 Medicare patients who underwent surgery,…

Comments Off on Superusers

Job Spotlight: Biostatisticans

Biostatisticians are the shepherds (and the police) that guide the science of developing new therapies for disease.  They come in several different flavors: Those involved in gathering information, designing experiments and analyzing data at the drug discovery stage - trying to sort out what works…

Comments Off on Job Spotlight: Biostatisticans

Aug 16: Statistics in Practice

Here in Part 2 of the Weekly Brief, we offer some tools to help you with the question, “what is the optimal set of alternatives to offer consumers?” Our course spotlight is on: Aug 30 - Sep 27: Discrete Choice Modeling and Conjoint Analysis See you in…

Comments Off on Aug 16: Statistics in Practice

Problem of the Week: The Second Heads

QUESTION: A friend tosses two coins, and you ask “Is one of them a heads?”  The friend replies “Yes.” What is the probability that the other is a heads? ANSWER:   One-third.  There are four ways the coins could have landed originally: HH:  0.25 probability…

Comments Off on Problem of the Week: The Second Heads

Aug 13: Statistics in Practice

This week we discuss the distinction between explanatory and predictive modeling and spotlight the workhorses of statistical modeling: Oct 4 - Nov 1: Regression Analysis Oct 4 - Nov 1: Categorical Data Analysis See you in class! - Peter Bruce,  Chief Academic Officer, Author, Instructor, and Founder The Institute for Statistics Education at Statistics.com…

Comments Off on Aug 13: Statistics in Practice

Explain or Predict?

A casual user of machine learning methods like CART or naive Bayes is accustomed to evaluating a model by measuring how well it predicts new data.  When examining the output of statistical models, they are often flummoxed by the profusion of assessment metrics. Typical multiple…

Comments Off on Explain or Predict?

Intervals (confidence, prediction and tolerance)

All students of statistics encounter confidence intervals.  Confidence intervals tell you, roughly, the interval within which you can be, say, 95% confident that the true value of some sample statistic lies.  This is not the precise technical definition, but it is how people use the…

Comments Off on Intervals (confidence, prediction and tolerance)

Small Ball: Calling all thinkers!

I was visiting New York a couple of weeks ago, transferring from Amtrak to the PATH trains at Newark.  PATH takes you to Wall Street - the #1 financial center in the world - and yet the process of paying for my $2.75 PATH ticket…

Comments Off on Small Ball: Calling all thinkers!

Aug 9: Statistics in Practice

We continue Monday's discussion of "people analytics' with a look from the customer's side and a call for all thinkers! (see below) Our course spotlight is on: Sep 6 - Oct 4: Predictive Analytics 1 - Machine Learning Tools Sep 6 - Oct 4: Programming 1…

Comments Off on Aug 9: Statistics in Practice

Industry Spotlight: HR (People Analytics)

Analytics has come to HR.  It’s partly Orwellian, tracking what employees do on the computer, and partly warm and fuzzy, leveraging the true informal organizational structure via network analysis (jump into Friday’s Network Analysis course to learn the basics).  One dimension assumes the worst about…

Comments Off on Industry Spotlight: HR (People Analytics)

Aug 5: Statistics in Practice

In this week’s Brief, analytics comes to the HR department (“people analytics”), and our course spotlight is on:  Sep 6 - Oct 4:  Predictive Analytics 1 Sep 6 - Oct 4:  Programming 1 (R or Python)     These courses are excellent entry points into our data…

Comments Off on Aug 5: Statistics in Practice

Aug 2: Statistics in Practice

In part 1 of this week’s brief, we looked at political analytics; in Part 2 we extend that look to commercial domains. Our course spotlight is Persuasion Analytics, taught by Ken Strasma, who pioneered the use of statistical modeling to microtarget voters in the 2004…

Comments Off on Aug 2: Statistics in Practice

Lift, Uplift, Gains

There are various metrics for assessing how well a model does, and one favored by marketers is lift, which is particularly relevant for the portion of the records predicted to be most profitable, most likely to buy, etc. 

Comments Off on Lift, Uplift, Gains

Probability

You might be wondering why such a basic word as probability appears here. It turns out that the term has deep tendrils in formal mathematics and philosophy, but is somewhat hard to pin down

Comments Off on Probability

Social Network Analysis (SNA) in Medicine

In hospitals, “sentinel events” are events that carry with them a significant risk of unexpected death or harm.  It is estimated that ⅔ of such sentinel events result from communications failures during the handoff of a patient from one provider to another (e.g. during a…

Comments Off on Social Network Analysis (SNA) in Medicine

Density

Density is a metric that describes how well-connected a network is

Comments Off on Density

Gittens Index

Consider the multi-arm bandit problem where each arm has an unknown probability of paying either 0 or 1, and a specified payoff discount factor of x (i.e. for two successive payoffs, the second is valued at x% of the first, where x < 100%).  The Gittens index is [...]

Comments Off on Gittens Index

Cold Start Problem

There are various ways to recommend additional products to an online purchaser, and the most effective ones rely on prior purchase or rating history -

Comments Off on Cold Start Problem

Autoregressive

Autoregressive refers to time series forecasting models (AR models) in which the independent variables (predictors) are prior values of the time series itself.

Comments Off on Autoregressive

Industry Spotlight: Hospitals

Hospitals are a major employer of statisticians and analytics professionals, both in support of clinical research like the retinopathy study described earlier, and to improve hospital operations (outcomes, cost management, etc.). Here are a few quick facts about the hospital industry: US hospital revenue totals…

Comments Off on Industry Spotlight: Hospitals

Matching Algorithms

Some applications of machine learning and artificial intelligence are recognizably impressive - predicting future hospital readmission of discharged patients, for example, or diagnosing retinopathy. Others - self-driving cars, for example - seem almost magical. The matching problem, though, is one where your first reaction might…

Comments Off on Matching Algorithms
Close Menu