Detecting a Slots Payout Difference of 2%

Most businesses use statistics and analytics to one degree or another, but there is only one industry that is built solely on this discipline.  This week we look at the casino business - in particular, the odds on slots. Slot machines are a casino’s best…

Comments Off on Detecting a Slots Payout Difference of 2%

Problem of the Week: A betting puzzle

QUESTION: A gambler playing against the “house” in a game like roulette or slots adopts the rule “Play until you win a certain amount, then stop.”  Will this ensure against player losses? What will be its effect on the house’s profit? ANSWER: Some look at this…

Comments Off on Problem of the Week: A betting puzzle

Book Review: Big Data in Practice by Bernard Marr 

This short book is essentially an enriched list of 45 examples of how companies have used big data analytics.  Marr sticks to high level generalities, and the book is in the spirit of light business journalism rather than detailed expositions that walk you through a…

Comments Off on Book Review: Big Data in Practice by Bernard Marr 

Dec 6: Statistics in Practice

This week we look at the casino business - in particular, the odds on slots. In our course spotlight, we start looking at some of the great stuff starting in at the beginning of the new year. In January, you can get started with basic statistics or biostatistics,…

Comments Off on Dec 6: Statistics in Practice

Of Note: Google Zooms Out on Microtargeting

Google recently announced that it would further limit its election ads to audience targeting based on age, gender, and general location (postal code level) context targeting (i.e. showing ads based on the content being viewed) Up to this point, the application of predictive modeling to…

Comments Off on Of Note: Google Zooms Out on Microtargeting

Betting and Statistics

Betting has had a long and close relationship with the science of probability and statistics.  In the mid-1600’s, the French intellectual and gambler Antoine Gombaud, who called himself Chevalier de Méré, enlisted the help of the mathematician Blaise Pascal to solve several puzzles involving dice…

Comments Off on Betting and Statistics

Of Note: Operations Research (O/R) For Sewage

Older urban sewer systems are not sealed, dedicated route networks leading to sewage treatment plants.  Rather, to save money when they were built decades ago, in some places they shared pipes with storm water drainage systems that lead to creeks, rivers and bays.  As a…

Comments Off on Of Note: Operations Research (O/R) For Sewage

Nov 25: Statistics in Practice

In this week’s Brief, we take a look at the history of betting and how it is entwined with probabilistic decision-making. Probabilistic decision-making is also the focus of our 3-course Optimization Mastery, which covers linear programming, integer programming, simulation and other operations research (O/R) techniques. Start…

Comments Off on Nov 25: Statistics in Practice

Word of the Week – Errors and Loss

Errors - differences between predicted values and actual values, also called residuals - are a key part of statistical models.  They form the raw material for various metrics of predictive model performance (accuracy, precision, recall, lift, etc.), and also the basis for diagnostics on descriptive…

Comments Off on Word of the Week – Errors and Loss

Unforeseen Consequences in Data Science

Unforeseen Consequences in Data Science After the massive Exxon Valdez oil spill, states passed laws boosting the liability of tanker companies for future spills.  The result was not as intended: fly-by-night companies, whose bankruptcy would not be consequential, took over the trade. In this blog…

Comments Off on Unforeseen Consequences in Data Science

Data Analytics

Terminology in Data Analytics As data continue to grow at a faster rate than either population or economic activity, so do organizations' efforts to deal with the data deluge, and use it to capture value.  And so do the methods used to analyze data, which…

Comments Off on Data Analytics

Data Analytics Courses

Data analytics and data science are popular terms, and skills in these areas are in great demand.  But what do these terms mean?  Below is an overview and a listing of related courses. For information about our certificate programs in data science and analytics, click here.…

Comments Off on Data Analytics Courses

Statistical Thinking

Gambler’s Fallacy I - forgetting that the “coin has no memory”   Gamblers often believe that after a long streak of one outcome, the probability of a different outcome has increased.  Sports commentators often say that a batter in a slump is “due” for a hit.…

Comments Off on Statistical Thinking

Latin hypercube

In Monte Carlo sampling for simulation problems, random values are generated from a probability distribution deemed appropriate for a given scenario (uniform, poisson, exponential, etc.).  In simple random sampling, each potential random value within the probability distribution has an equal value of being selected. Just…

Comments Off on Latin hypercube

Oct 14: Statistics in Practice

This week we look at several ways to fool yourself, statistically - variants of the “Gambler’s Fallacy.” Gambling is all about accurately assessing risk, so, naturally, our featured course is: Nov 15 - Dec 13: Risk Simulation and Queuing See you in class! - Peter Bruce,…

Comments Off on Oct 14: Statistics in Practice

Workforce Management

Anyone who has worked in retail knows the anxiety that attends workforce scheduling for both manager and employee.  The manager wonders “Will my employees show up at the right times?” The employee wonders “Will I be scheduled for inconvenient times?  Enough hours? Too many hours?”  …

Comments Off on Workforce Management

Regularize

The art of statistics and data science lies, in part, in taking a real-world problem and converting it into a well-defined quantitative problem amenable to useful solution. At the technical end of things lies regularization. In data science this involves various methods of simplifying models,…

Comments Off on Regularize

Machine Learning and Human Bias

Does better AI offer the hope of prejudice-free decision-making?  Ironically, the reverse might be true, especially with the advent of deep learning.   Bias in hiring is one area where private companies move with great care, since there are thickets of laws and regulations in most…

Comments Off on Machine Learning and Human Bias

Oct 7: Statistics in Practice

This week we take a look at how AI encodes human bias, despite our best efforts. Our spotlight this week is on: Nov 8 - Dec 6: Deep Learning See you in class! - Peter Bruce, Chief Academic Officer, Author, Instructor, and Founder The Institute for…

Comments Off on Oct 7: Statistics in Practice

The Curse of Dimensionality

There are more than 3 dozen curses in Harry Potter.  Data scientists have only one - the “curse of dimensionality.”  Dimensionality is the number of predictors or input variables in a model, and the “curse” refers to the problems that result from including too many…

Comments Off on The Curse of Dimensionality

Student Spotlight: Peter Mulready

Peter Mulready is an independent consultant, who worked previously as a system architect at Boehringer Ingelheim, one of the world's largest pharmaceutical companies. Peter got his degree in biology, but his focus shifted to managing and optimizing the use of data in drug discovery research. …

Comments Off on Student Spotlight: Peter Mulready

Anomaly Detection via Conversation: “How was your vacation?”

A friendly query about your holiday might be a question you get from a roaming agent in the check-in area at the Tel Aviv airport.  Israel, considered to have the most effective airport security in the world, does not rely solely on routine mechanical screening…

Comments Off on Anomaly Detection via Conversation: “How was your vacation?”

Of Note: e-cigarettes

Last week, the Trump administration announced a forthcoming ban on e-cigarettes, following news stories of a spate of deaths from vaping.  The Wall Street Journal, on Friday the 13th, published both an editorial and an op-ed piece suggesting that any harm from e-cigarettes is minor…

Comments Off on Of Note: e-cigarettes

Book Review: Bandit Algorithms for Website Optimization, by John Myles White

Bandit Algorithms for Website Optimization, by John Myles White A classic statistical experimental design comparing treatments (two treatments, treatment versus control, multiple treatments) specifies a sample size, collection of data, then a decision, typically based on hypothesis-testing:  the winning treatment must attain a level of…

Comments Off on Book Review: Bandit Algorithms for Website Optimization, by John Myles White

Of Note: “Islands in Search of Contents”

“Islands in Search of Continents” is the subtitle of an article by Michael Clarke and Iain Chalmers in the Journal of the American Medical Association (1998; 280: 280-282).  It refers to the fact that many studies are conducted and reported in isolation from other studies on the…

Comments Off on Of Note: “Islands in Search of Contents”

Meta Analysis

1.2 million scientific papers were indexed by PubMed in 2011 (see Are Scientists Doing Too Much Research), ample proof that there are lots of people studying the same or similar things.  For example, there have been Over 100 studies of suicide following psychiatric institutionalization     38 studies…

Comments Off on Meta Analysis

Industry Spotlight: Health Analytics

Patient Data Management Health analytics is a hot topic now, but to do the analytics you need data - this is where Electronic Health Records (EHR) come in.  An integrated, standardized system for sharing and accessing health data has been “just around the corner” now…

Comments Off on Industry Spotlight: Health Analytics

Of Note: Superusers

“Superusers” of medical services are the small fraction of patients that account for huge consumption of medical services.  An article published August 14, 2019 in JAMA Surgery (online) reports on the application of machine learning methods to Medicare data on 1,049,160 Medicare patients who underwent surgery,…

Comments Off on Of Note: Superusers

Job Spotlight: Biostatisticans

Biostatisticians are the shepherds (and the police) that guide the science of developing new therapies for disease.  They come in several different flavors: Those involved in gathering information, designing experiments and analyzing data at the drug discovery stage - trying to sort out what works…

Comments Off on Job Spotlight: Biostatisticans

Aug 16: Statistics in Practice

Here in Part 2 of the Weekly Brief, we offer some tools to help you with the question, “what is the optimal set of alternatives to offer consumers?” Our course spotlight is on: Aug 30 - Sep 27: Discrete Choice Modeling and Conjoint Analysis See you in…

Comments Off on Aug 16: Statistics in Practice

Problem of the Week: The Second Heads

QUESTION: A friend tosses two coins, and you ask “Is one of them a heads?”  The friend replies “Yes.” What is the probability that the other is a heads? ANSWER:   One-third.  There are four ways the coins could have landed originally: HH:  0.25 probability…

Comments Off on Problem of the Week: The Second Heads

Aug 13: Statistics in Practice

This week we discuss the distinction between explanatory and predictive modeling and spotlight the workhorses of statistical modeling: Oct 4 - Nov 1: Regression Analysis Oct 4 - Nov 1: Categorical Data Analysis See you in class! - Peter Bruce,  Chief Academic Officer, Author, Instructor, and Founder The Institute for Statistics Education at Statistics.com…

Comments Off on Aug 13: Statistics in Practice

Explain or Predict?

A casual user of machine learning methods like CART or naive Bayes is accustomed to evaluating a model by measuring how well it predicts new data.  When examining the output of statistical models, they are often flummoxed by the profusion of assessment metrics. Typical multiple…

Comments Off on Explain or Predict?

Word of the Week: Intervals (confidence, prediction and tolerance)

All students of statistics encounter confidence intervals.  Confidence intervals tell you, roughly, the interval within which you can be, say, 95% confident that the true value of some sample statistic lies.  This is not the precise technical definition, but it is how people use the…

Comments Off on Word of the Week: Intervals (confidence, prediction and tolerance)

Small Ball: Calling all thinkers!

I was visiting New York a couple of weeks ago, transferring from Amtrak to the PATH trains at Newark.  PATH takes you to Wall Street - the #1 financial center in the world - and yet the process of paying for my $2.75 PATH ticket…

Comments Off on Small Ball: Calling all thinkers!

Aug 9: Statistics in Practice

We continue Monday's discussion of "people analytics' with a look from the customer's side and a call for all thinkers! (see below) Our course spotlight is on: Sep 6 - Oct 4: Predictive Analytics 1 - Machine Learning Tools Sep 6 - Oct 4: Programming 1…

Comments Off on Aug 9: Statistics in Practice

Industry Spotlight: HR (People Analytics)

Analytics has come to HR.  It’s partly Orwellian, tracking what employees do on the computer, and partly warm and fuzzy, leveraging the true informal organizational structure via network analysis (jump into Friday’s Network Analysis course to learn the basics).  One dimension assumes the worst about…

Comments Off on Industry Spotlight: HR (People Analytics)

Aug 5: Statistics in Practice

In this week’s Brief, analytics comes to the HR department (“people analytics”), and our course spotlight is on:  Sep 6 - Oct 4:  Predictive Analytics 1 Sep 6 - Oct 4:  Programming 1 (R or Python)     These courses are excellent entry points into our data…

Comments Off on Aug 5: Statistics in Practice

Aug 2: Statistics in Practice

In part 1 of this week’s brief, we looked at political analytics; in Part 2 we extend that look to commercial domains. Our course spotlight is Persuasion Analytics, taught by Ken Strasma, who pioneered the use of statistical modeling to microtarget voters in the 2004…

Comments Off on Aug 2: Statistics in Practice

Lift, Uplift, Gains

There are various metrics for assessing how well a model does, and one favored by marketers is lift, which is particularly relevant for the portion of the records predicted to be most profitable, most likely to buy, etc. 

Comments Off on Lift, Uplift, Gains

Probability

You might be wondering why such a basic word as probability appears here. It turns out that the term has deep tendrils in formal mathematics and philosophy, but is somewhat hard to pin down

Comments Off on Probability

Social Network Analysis (SNA) in Medicine

In hospitals, “sentinel events” are events that carry with them a significant risk of unexpected death or harm.  It is estimated that ⅔ of such sentinel events result from communications failures during the handoff of a patient from one provider to another (e.g. during a…

Comments Off on Social Network Analysis (SNA) in Medicine

Density

Density is a metric that describes how well-connected a network is

Comments Off on Density

Gittens Index

Consider the multi-arm bandit problem where each arm has an unknown probability of paying either 0 or 1, and a specified payoff discount factor of x (i.e. for two successive payoffs, the second is valued at x% of the first, where x < 100%).  The Gittens index is [...]

Comments Off on Gittens Index

Cold Start Problem

There are various ways to recommend additional products to an online purchaser, and the most effective ones rely on prior purchase or rating history -

Comments Off on Cold Start Problem

Autoregressive

Autoregressive refers to time series forecasting models (AR models) in which the independent variables (predictors) are prior values of the time series itself.

Comments Off on Autoregressive

Industry Spotlight: Hospitals

Hospitals are a major employer of statisticians and analytics professionals, both in support of clinical research like the retinopathy study described earlier, and to improve hospital operations (outcomes, cost management, etc.). Here are a few quick facts about the hospital industry: US hospital revenue totals…

Comments Off on Industry Spotlight: Hospitals

Matching Algorithms

Some applications of machine learning and artificial intelligence are recognizably impressive - predicting future hospital readmission of discharged patients, for example, or diagnosing retinopathy. Others - self-driving cars, for example - seem almost magical. The matching problem, though, is one where your first reaction might…

Comments Off on Matching Algorithms

Instructor Spotlight: Cliff Ragsdale

Cliff T. Ragsdale teaches several courses for the Institute in the area of operations research, based on his best selling text “Spreadsheet Modeling and Decision Analysis.”  One of Cliff’s special talents is making his subject, which can be quite challenging technically, widely accessible. His courses do…

Comments Off on Instructor Spotlight: Cliff Ragsdale

Industry Spotlight: Consulting

When a new technology arrives, consulting companies can quickly add staff and expertise to build institutional capacity centered around the technology in ways companies focused on delivering their own products and services cannot.  Large consulting companies like Booz Allen and McKinsey, as well as smaller…

Comments Off on Industry Spotlight: Consulting

Industry Spotlight: Baseball (Sports) Statistics

The U.S. baseball season opens Thursday, March 28, and celebrates the 48th season of analytics in baseball, beginning with the founding of the Sabermetric Society in 1971 (the same year that Satchel Paige entered the Hall of Fame).  Analytics has come a long way in…

Comments Off on Industry Spotlight: Baseball (Sports) Statistics

Industry Spotlight: Agriculture

Weeds are big business - the global herbicide market is over $35 billion annually.  Weeds are also big government (think “invasive species”). California’s listing of weeds is called Encycloweedia, and the state publishes a quarterly newsletter called Noxious Times. Colorado publishes a similar periodical, Invader.…

Comments Off on Industry Spotlight: Agriculture

Industry Spotlight: Precision Agriculture

The application of analytics to agriculture has given rise to what is called “precision agriculture,” a science that seeks to take advantage of and use detailed information that is local in time and place.  Tractors and farm equipment are being equipped with sensors and software…

Comments Off on Industry Spotlight: Precision Agriculture

Job Spotlight: Risk Analyst

Many jobs are centered around risk management.  If you’re looking through job postings, of course, you’ll see lots of jobs whose purpose is to make sure that nothing bad happens - the equivalent of locking the doors and closing the windows.  More interesting from a…

Comments Off on Job Spotlight: Risk Analyst

Job Spotlight: Data Scientist

Data science is one of a host of similar terms.  “Artificial intelligence” has been around since the 1960’s and “data mining” for at least a couple of decades.  “Machine learning” came out of the computer science community, and “analytics,” “data analytics,” and “predictive analytics” came…

Comments Off on Job Spotlight: Data Scientist

Course Spotlight: Survival Analysis

Convinced that he, like his father, would die in his 40’s, Winston Churchill lived his early life in a frenetic hurry.  He had participated in four wars on three continents by his mid-20’s, served in multiple ministerial positions by his 30’s, and published 12 books…

Comments Off on Course Spotlight: Survival Analysis
"When I started teaching mandatory biostatistics classes in 1970 at UNC, I realized early on that a lot of kids didn't want to take a course they perceived as boring, so I kept things relaxed and fun."
Instructor Spotlight: David Kleinbaum

Industry Spotlight: Military Operations

Abraham Wald, a persecuted Jewish mathematician who fled Austria just before World War II, led an analysis of allied bombers returning from missions.  Hitherto, the Air Force had focused on reinforcing areas that showed the most damage on return. Wald convinced them instead to focus…

Comments Off on Industry Spotlight: Military Operations

Likert scale assessment surveys

Do you work with multiple choice tests, or Likert scale assessment surveys?  Rasch methods help you construct linear measures from these forms of scored observations and analyze the results from such surveys and tests.  "Practical Rasch Measurement - Core Topics" In this course, you will learn practical…

Comments Off on Likert scale assessment surveys

Historical Spotlight: Jacob Wolfowitz

World War II was a crucible of technological innovation, including advances in statistics. Jacob Wolfowitz, born a century ago (1920), looked at the problem of noisy radio transmissions. Coded radio transmissions were critical elements of military command and control, and they were plagued by the…

Comments Off on Historical Spotlight: Jacob Wolfowitz

Certificate Graduate: Cristobal Bazan, United Nations Agency

Certificate Student Profile of Cristobal Bazan My courses help me look at more complex problems using different approaches to show more interesting aspects of conditions, beyond just tables and charts, more than just sampling or descriptive statistics. Cristobal Bazan United Nations Agency How do you…

Comments Off on Certificate Graduate: Cristobal Bazan, United Nations Agency

Problem of the Week: The Value of Bedrooms

Question: You work for an internet real-estate company, building statistical models to predict home price on the basis of square footage, number of bedrooms, number of bathrooms, property type (single family home, townhouse, multiplex), and age. Surprisingly, you find the coefficient for bedrooms is negative,…

Comments Off on Problem of the Week: The Value of Bedrooms

Statistically Significant – But Not True

If you are looking for the Feature Engineering blog post, you can find it here: https://www.statistics.com/blog/1/1558369154-feature-engineering-data-prep-still-needed/ In 2015, at an Alzheimer's conference, Biogen researchers presented dramatic brain scans showing that the antibody aducanumab effectively cleared out plaque in the brain, plaque that was associated with…

Comments Off on Statistically Significant – But Not True

Book Review: Everyone Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We REALLY Are

This week's book review is of Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are, Seth Stephens-Davidowitz's fascinating book about how social media data reveals all sorts of things about us that we barely know ourselves. …

Comments Off on Book Review: Everyone Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We REALLY Are

Industry Spotlight – Precision Agriculture

The application of analytics to agriculture has given rise to what is called "precision agriculture", a science that seeks to take advantage of and use detailed information that is local in time and place. Tractors and farm equipment are being equipped with sensors and software…

Comments Off on Industry Spotlight – Precision Agriculture

Historical Spotlight: Ronald A. Fisher

In 1919, Ronald A. Fisher was appointed as chief statistician at the agricultural research station in Rothamsted, a post created for him. His work there resulted, in 1925, in the publication of his classic Statistical Methods for Research Workers. An important message of his book…

Comments Off on Historical Spotlight: Ronald A. Fisher

Instructor Spotlight: Prof. David Unwin

Prof. David Unwin has guided, developed and taught the spatial analysis curriculum at Statistics.com since 2005. David lives in central England, about an hour north of the storied Rothamsted agricultural research center. Until his retirement in 2002, he was Professor of Geography at Birkbeck College,…

Comments Off on Instructor Spotlight: Prof. David Unwin

Statistics in Agriculture: Encycloweedia

Weeds are big business - the global herbicide market is over $35 billion annually. Weeds are also big government (think "invasive species"). California's listing of weeds is called Encycloweedia, and the state publishes a quarterly newsletter called Noxious Times. Colorado publishes a similar periodical, Invader.…

Comments Off on Statistics in Agriculture: Encycloweedia

Tensor

A tensor is the multidimensional extension of a matrix (i.e. scalar > vector > matrix > tensor). 

Comments Off on Tensor

Problem of the Week: Missing Data

Question: You have a supervised learning task with 30 predictors, in which 5% of the observations are missing.  The missing data are randomly distributed across variables and records. If your strategy for coping with missing data is to drop records with missing data, what proportion…

Comments Off on Problem of the Week: Missing Data

Student Spotlight: Barry Eggleston

Barry Eggleston is a health research statistician who has worked on both clinical trials and observational studies, and is currently with RTI in North Carolina. In his early career, his work was solely designing and analyzing clinical trials using typical biostatistics methods ranging from t-test…

Comments Off on Student Spotlight: Barry Eggleston

A Deep Dive into Deep Learning

On Wednesday, March 27, the 2018 Turing Award in computing was given to Yoshua Bengio, Geoffrey Hinton and Yann LeCun for their work on deep learning. Deep learning by complex neural networks lies behind the applications that are finally bringing artificial intelligence out of the…

Comments Off on A Deep Dive into Deep Learning

Industry Spotlight: Credit Scoring

In the U.S., credit scoring is dominated by three companies - Experian, TransUnion and Equifax, employing roughly 30,000 people. An important player in the scoring methodology is FICO, previously Fair Isaac Corporation, and the scores are typically called "FICO scores." Credit scoring is the oldest…

Comments Off on Industry Spotlight: Credit Scoring

Book Review: Weapons of Math Destruction

Cathy O'Neil's Weapons of Math Destruction, when it was first published in 2016, sounded an early alarm about the big data algorithms and their potential for social evil. The cover is adorned with a robotic death's head and the subtitle reads "How Big Data Increases…

Comments Off on Book Review: Weapons of Math Destruction

Historical Spotlight: Alan Turing

80 years ago, in 1939, Alan Turing began work on the code-breaking system that would eventually prove key in helping Britain survive the German submarine threat in the Atlantic. Last month, the Turing Award in computer science prize (sometimes referred to as the "Nobel Prize…

Comments Off on Historical Spotlight: Alan Turing

Confusing Terms in Data Science – A Look at Synonyms

To a statistician, a sample is a collection of observations (cases).  To a machine learner, it’s a single observation.  Modern data science has its origin in several different fields, which leads to potentially confusing  synonyms, like these:

Comments Off on Confusing Terms in Data Science – A Look at Synonyms

Confusing Terms in Data Science – A Look at Homonyms and more

To a statistician, a sample is a collection of observations (cases).  To a machine learner, it’s a single observation.  Modern data science has its origin in several different fields, which leads to potentially confusing homonyms like these: 

 

 

Comments Off on Confusing Terms in Data Science – A Look at Homonyms and more

Confusing Terms in Data Science – A Look at Synonyms, Homonyms and more

To a statistician, a sample is a collection of observations (cases). To a machine learner, it's a single observation. Modern data science has its origin in several different fields, which leads to potentially confusing homonyms and synonyms, like these: Homonyms (words with multiple meanings): Bias: To…

Comments Off on Confusing Terms in Data Science – A Look at Synonyms, Homonyms and more

Industry Spotlight: Package Delivery

Nothing better illustrates the encroachment of data science and analytics on the older "economy of tangible things" than the business of delivering packages. The use of analytics in package delivery is not new. Companies like UPS and Fedex are longtime users of operations research methods…

Comments Off on Industry Spotlight: Package Delivery

Job Spotlight: Sports Statistician

The field of sports statistician is not exactly new; the American Statistical Association's section on Sports Statistics was formed in 1992. Three of Statistics.com's instructors have professional experience in sports statistics - Ben Baumer (SQL) served as statistician for the NY Mets, Stephanie Kovalchik (Meta…

Comments Off on Job Spotlight: Sports Statistician

Industry Spotlight: Baseball – Opening Day & Statistics in Sports

The U.S. baseball season opens Thursday, March 28, and celebrates the 48th season of analytics in baseball, beginning with the founding of the Sabermetric Society in 1971 (the same year that Satchel Paige entered the Hall of Fame). Analytics has come a long way in…

Comments Off on Industry Spotlight: Baseball – Opening Day & Statistics in Sports

Jaquard’s coefficient

When variables have binary (yes/no) values, a couple of issues come up when measuring distance or similarity between records.  One of them is the "yacht owner" problem.

Comments Off on Jaquard’s coefficient

Darwin’s Legacy in Statistics

Charles Darwin, the most famous grandson of the Enlightenment thinker Erasmus Darwin, published his ground-breaking theory of evolution, “The Origin of Species,”160 years ago. Another grandson of Erasmus, Francis Galton, became one of the founding fathers of statistics (correlation, the “wisdom of the crowd,” regression…

Comments Off on Darwin’s Legacy in Statistics

Industry Spotlight: Customer Segmentation

Are you "young and rustic?" Or perhaps a "toolbelt traditionalist?" These are nicknames given to customer segments identified by market research firm Claritas, with its statistical clustering tool. Long before the advent of individualized product recommendations, business sought to segment customers into distinct groups on…

Comments Off on Industry Spotlight: Customer Segmentation

Industry Spotlight: CROs

CRO's, or contract research organizations, are a $40 billion industry, growing at close to 12% per year. They provide contract services to the pharmaceutical industry, including statistical design and analysis, laboratory services, administration of clinical trials, and monitoring of drugs once they are on the…

Comments Off on Industry Spotlight: CROs

Handling the Noise – Boost It or Ignore It?

In most statistical modeling or machine learning prediction tasks, there will be cases that can be easily predicted based on their predictor values (signal), as well as cases where predictions are unclear (noise). Two statistical learning methods, boosting and ProfWeight, use those difficult cases in…

Comments Off on Handling the Noise – Boost It or Ignore It?

Rectangular data

Rectangular data are the staple of statistical and machine learning models.  Rectangular data are multivariate cross-sectional data (i.e. not time-series or repeated measure) in which each column is a variable (feature), and each row is a case or record.

Comments Off on Rectangular data
Close Menu