#### Social Network Analysis (SNA) in Medicine

In hospitals, “sentinel events” are events that carry with them a significant risk of unexpected death or harm.  It is estimated that ⅔ of such sentinel events result from communications failures during the handoff of a patient from one provider to another (e.g. during a…

Comments Off on Social Network Analysis (SNA) in Medicine

#### Density

Density is a metric that describes how well-connected a network is

#### Algorithms

We have an extensive statistical glossary and have been sending out a "word of the week" newsfeed for a number of years.  Take a look at the results

#### Gittens Index

Consider the multi-arm bandit problem where each arm has an unknown probability of paying either 0 or 1, and a specified payoff discount factor of x (i.e. for two successive payoffs, the second is valued at x% of the first, where x < 100%).  The Gittens index is [...]

#### Cold Start Problem

There are various ways to recommend additional products to an online purchaser, and the most effective ones rely on prior purchase or rating history -

Comments Off on Cold Start Problem

#### Autoregressive

Autoregressive refers to time series forecasting models (AR models) in which the independent variables (predictors) are prior values of the time series itself.

#### Industry Spotlight: Hospitals

Hospitals are a major employer of statisticians and analytics professionals, both in support of clinical research like the retinopathy study described earlier, and to improve hospital operations (outcomes, cost management, etc.). Here are a few quick facts about the hospital industry: US hospital revenue totals…

Comments Off on Industry Spotlight: Hospitals

#### Healthcare Analytics: Exploration versus Confirmation

Perhaps the most active application of analytics and data mining is healthcare. This week we look at one success story, the use of machine learning to predict diabetic retinopathy, one story of disappointment, the use of genetic testing in a puzzling disease, and a basic…

Comments Off on Healthcare Analytics: Exploration versus Confirmation

#### Problem of the Week: Simpson’s Paradox – baseball

Question: A baseball team is comparing two of its hitters, Hernandez and Dimock. Hernandez hit .250 in 2017 and .275 in 2018. Dimock did worse in both years - .245 in 2017 and .270 in 2018. Overall, though, Dimock hit better across the two years,…

#### Matching Algorithms

Some applications of machine learning and artificial intelligence are recognizably impressive - predicting future hospital readmission of discharged patients, for example, or diagnosing retinopathy. Others - self-driving cars, for example - seem almost magical. The matching problem, though, is one where your first reaction might…

Dr. Cliff Ragsdale

#### Instructor Spotlight: Cliff Ragsdale

Cliff T. Ragsdale teaches several courses for the Institute in the area of operations research, based on his best selling text “Spreadsheet Modeling and Decision Analysis.”  One of Cliff’s special talents is making his subject, which can be quite challenging technically, widely accessible. His courses do…

Comments Off on Instructor Spotlight: Cliff Ragsdale

#### Industry Spotlight: Consulting

When a new technology arrives, consulting companies can quickly add staff and expertise to build institutional capacity centered around the technology in ways companies focused on delivering their own products and services cannot.  Large consulting companies like Booz Allen and McKinsey, as well as smaller…

Comments Off on Industry Spotlight: Consulting

#### Industry Spotlight: Baseball (Sports) Statistics

The U.S. baseball season opens Thursday, March 28, and celebrates the 48th season of analytics in baseball, beginning with the founding of the Sabermetric Society in 1971 (the same year that Satchel Paige entered the Hall of Fame).  Analytics has come a long way in…

Comments Off on Industry Spotlight: Baseball (Sports) Statistics

#### Industry Spotlight: Package Delivery Business

Nothing better illustrates the encroachment of data science and analytics on the older “economy of tangible things” than the business of delivering packages. The use of analytics in package delivery is not new. Companies like UPS and Fedex are longtime users of operations research methods…

#### Industry Spotlight: Credit Scoring

In the U.S., credit scoring is dominated by three companies - Experian, TransUnion and Equifax, employing roughly 30,000 people.  An important player in the scoring methodology is FICO, previously Fair Isaac Corporation, and the scores are typically called “FICO scores.”  Credit scoring is the oldest…

Comments Off on Industry Spotlight: Credit Scoring

#### Industry Spotlight: Agriculture

Weeds are big business - the global herbicide market is over \$35 billion annually.  Weeds are also big government (think “invasive species”). California’s listing of weeds is called Encycloweedia, and the state publishes a quarterly newsletter called Noxious Times. Colorado publishes a similar periodical, Invader.…

Comments Off on Industry Spotlight: Agriculture

#### Industry Spotlight: Precision Agriculture

The application of analytics to agriculture has given rise to what is called “precision agriculture,” a science that seeks to take advantage of and use detailed information that is local in time and place.  Tractors and farm equipment are being equipped with sensors and software…

Comments Off on Industry Spotlight: Precision Agriculture

#### Job Spotlight: Risk Analyst

Many jobs are centered around risk management.  If you’re looking through job postings, of course, you’ll see lots of jobs whose purpose is to make sure that nothing bad happens - the equivalent of locking the doors and closing the windows.  More interesting from a…

Comments Off on Job Spotlight: Risk Analyst

#### Industry Spotlight: Automotive

The auto industry serves as a perfect exemplar of three key eras of statistics and data science in service of industry: Total Quality Management (TQM) First in Japan, and later in the U.S., the auto industry became an enthusiastic adherent to the Total Quality Management…

Comments Off on Industry Spotlight: Automotive

#### Job Spotlight: Data Scientist

Data science is one of a host of similar terms.  “Artificial intelligence” has been around since the 1960’s and “data mining” for at least a couple of decades.  “Machine learning” came out of the computer science community, and “analytics,” “data analytics,” and “predictive analytics” came…

Comments Off on Job Spotlight: Data Scientist

#### Course Spotlight: Survival Analysis

Convinced that he, like his father, would die in his 40’s, Winston Churchill lived his early life in a frenetic hurry.  He had participated in four wars on three continents by his mid-20’s, served in multiple ministerial positions by his 30’s, and published 12 books…

Comments Off on Course Spotlight: Survival Analysis
"When I started teaching mandatory biostatistics classes in 1970 at UNC, I realized early on that a lot of kids didn't want to take a course they perceived as boring, so I kept things relaxed and fun."
Instructor Spotlight: David Kleinbaum

#### Likert scale assessment surveys

Do you work with multiple choice tests, or Likert scale assessment surveys? Rasch methods help you construct linear measures from these forms of scored observations and analyze the results from such surveys and tests. "Practical Rasch Measurement - Core Topics" In this course, you will learn practical…

Comments Off on Likert scale assessment surveys

#### Historical Spotlight: Jacob Wolfowitz

World War II was a crucible of technological innovation, including advances in statistics. Jacob Wolfowitz, born a century ago (1920), looked at the problem of noisy radio transmissions. Coded radio transmissions were critical elements of military command and control, and they were plagued by the…

Comments Off on Historical Spotlight: Jacob Wolfowitz

#### Certificate Graduate: Karolis Urbonas, Amazon

The Statistics.com courses have helped me a lot, pushing me to the limit and making me learn much more than I expected I could. The knowledge I gained I could immediately leverage in my job ... then eventually led to landing a job in my…

#### Certificate Graduate: Cristobal Bazan, United Nations Agency

Certificate Student Profile of Cristobal Bazan My courses help me look at more complex problems using different approaches to show more interesting aspects of conditions, beyond just tables and charts, more than just sampling or descriptive statistics. Cristobal Bazan United Nations Agency How do you…

#### Feature Engineering and Data Prep – Still Needed?

It is a truism of machine learning and predictive analytics that 80% of an analyst's time is consumed in cleaning and preparing the needed data. I saw an estimate by a Google engineer that 25% of the time was spent just looking for the right…

Comments Off on Feature Engineering and Data Prep – Still Needed?

#### Job Spotlight: Risk Analyst

Many jobs are centered around risk management. If you're looking through job postings, of course, you'll see lots of jobs whose purpose is to make sure that nothing bad happens - the equivalent of locking the doors and closing the windows. More interesting from a…

Comments Off on Job Spotlight: Risk Analyst

#### Problem of the Week: The Value of Bedrooms

Question: You work for an internet real-estate company, building statistical models to predict home price on the basis of square footage, number of bedrooms, number of bathrooms, property type (single family home, townhouse, multiplex), and age. Surprisingly, you find the coefficient for bedrooms is negative,…

Comments Off on Problem of the Week: The Value of Bedrooms

#### Industry Spotlight – Pharma

The cost of bringing a new drug to market is over \$2 billion, by some estimates. This covers the R&D, clinical trial testing and regulatory approval costs of the drug that makes it through the whole process, and also the same costs of the 9…

Comments Off on Industry Spotlight – Pharma

#### Statistically Significant – But Not True

If you are looking for the Feature Engineering blog post, you can find it here: https://www.statistics.com/feature-engineering-data-prep-still-needed/ In 2015, at an Alzheimer's conference, Biogen researchers presented dramatic brain scans showing that the antibody aducanumab effectively cleared out plaque in the brain, plaque that was associated with…

Comments Off on Statistically Significant – But Not True

#### Book Review: Everyone Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We REALLY Are

This week's book review is of Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are, Seth Stephens-Davidowitz's fascinating book about how social media data reveals all sorts of things about us that we barely know ourselves.…

Comments Off on Book Review: Everyone Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We REALLY Are

#### Industry Spotlight – Precision Agriculture

The application of analytics to agriculture has given rise to what is called "precision agriculture", a science that seeks to take advantage of and use detailed information that is local in time and place. Tractors and farm equipment are being equipped with sensors and software…

Comments Off on Industry Spotlight – Precision Agriculture

#### Historical Spotlight: Ronald A. Fisher

In 1919, Ronald A. Fisher was appointed as chief statistician at the agricultural research station in Rothamsted, a post created for him. His work there resulted, in 1925, in the publication of his classic Statistical Methods for Research Workers. An important message of his book…

Comments Off on Historical Spotlight: Ronald A. Fisher
Prof. David Unwin

#### Instructor Spotlight: Prof. David Unwin

Prof. David Unwin has guided, developed and taught the spatial analysis curriculum at Statistics.com since 2005. David lives in central England, about an hour north of the storied Rothamsted agricultural research center. Until his retirement in 2002, he was Professor of Geography at Birkbeck College,…

Comments Off on Instructor Spotlight: Prof. David Unwin

#### Statistics in Agriculture: Encycloweedia

Weeds are big business - the global herbicide market is over \$35 billion annually. Weeds are also big government (think "invasive species"). California's listing of weeds is called Encycloweedia, and the state publishes a quarterly newsletter called Noxious Times. Colorado publishes a similar periodical, Invader.…

Comments Off on Statistics in Agriculture: Encycloweedia

#### Tensor

A tensor is the multidimensional extension of a matrix (i.e. scalar > vector > matrix > tensor).

#### Problem of the Week: Missing Data

Question: You have a supervised learning task with 30 predictors, in which 5% of the observations are missing.  The missing data are randomly distributed across variables and records. If your strategy for coping with missing data is to drop records with missing data, what proportion…

Comments Off on Problem of the Week: Missing Data

#### Student Spotlight: Barry Eggleston

Barry Eggleston is a health research statistician who has worked on both clinical trials and observational studies, and is currently with RTI in North Carolina. In his early career, his work was solely designing and analyzing clinical trials using typical biostatistics methods ranging from t-test…

Comments Off on Student Spotlight: Barry Eggleston

#### A Deep Dive into Deep Learning

On Wednesday, March 27, the 2018 Turing Award in computing was given to Yoshua Bengio, Geoffrey Hinton and Yann LeCun for their work on deep learning. Deep learning by complex neural networks lies behind the applications that are finally bringing artificial intelligence out of the…

Comments Off on A Deep Dive into Deep Learning

#### Industry Spotlight: Credit Scoring

In the U.S., credit scoring is dominated by three companies - Experian, TransUnion and Equifax, employing roughly 30,000 people. An important player in the scoring methodology is FICO, previously Fair Isaac Corporation, and the scores are typically called "FICO scores." Credit scoring is the oldest…

Comments Off on Industry Spotlight: Credit Scoring

#### Industry Spotlight: The IRS is Watching You

The IRS (U.S. Internal Revenue Service) has been using computers to choose tax returns for audit since 1962. Early on, the selection was rule-based, but the IRS turned to statistical modeling in 1969, using the oldest predictive analytics model in the toolbox - discriminant analysis.…

Comments Off on Industry Spotlight: The IRS is Watching You

#### Book Review: Weapons of Math Destruction

Cathy O'Neil's Weapons of Math Destruction, when it was first published in 2016, sounded an early alarm about the big data algorithms and their potential for social evil. The cover is adorned with a robotic death's head and the subtitle reads "How Big Data Increases…

Comments Off on Book Review: Weapons of Math Destruction

#### Historical Spotlight: Alan Turing

80 years ago, in 1939, Alan Turing began work on the code-breaking system that would eventually prove key in helping Britain survive the German submarine threat in the Atlantic. Last month, the Turing Award in computer science prize (sometimes referred to as the "Nobel Prize…

Comments Off on Historical Spotlight: Alan Turing

#### Confusing Terms in Data Science – A Look at Synonyms

To a statistician, a sample is a collection of observations (cases).  To a machine learner, it’s a single observation.  Modern data science has its origin in several different fields, which leads to potentially confusing  synonyms, like these:

Comments Off on Confusing Terms in Data Science – A Look at Synonyms

#### Confusing Terms in Data Science – A Look at Homonyms and more Synonyms

To a statistician, a sample is a collection of observations (cases).  To a machine learner, it’s a single observation.  Modern data science has its origin in several different fields, which leads to potentially confusing homonyms like these:

Comments Off on Confusing Terms in Data Science – A Look at Homonyms and more Synonyms

#### Confusing Terms in Data Science – A Look at Synonyms, Homonyms and more

To a statistician, a sample is a collection of observations (cases). To a machine learner, it's a single observation. Modern data science has its origin in several different fields, which leads to potentially confusing homonyms and synonyms, like these: Homonyms (words with multiple meanings): Bias: To…

Comments Off on Confusing Terms in Data Science – A Look at Synonyms, Homonyms and more

#### Industry Spotlight: Package Delivery

Nothing better illustrates the encroachment of data science and analytics on the older "economy of tangible things" than the business of delivering packages. The use of analytics in package delivery is not new. Companies like UPS and Fedex are longtime users of operations research methods…

Comments Off on Industry Spotlight: Package Delivery

#### Ethical Practice in Data Mining

Prior to the advent of internet-connected devices, the largest source of big data was public interaction on the internet. Social media users, as well as shoppers and searchers on the internet, make an implicit deal with the big companies that provide these services: users can…

Comments Off on Ethical Practice in Data Mining

#### Job Spotlight: Sports Statistician

The field of sports statistician is not exactly new; the American Statistical Association's section on Sports Statistics was formed in 1992. Three of Statistics.com's instructors have professional experience in sports statistics - Ben Baumer (SQL) served as statistician for the NY Mets, Stephanie Kovalchik (Meta…

Comments Off on Job Spotlight: Sports Statistician

#### Industry Spotlight: Baseball – Opening Day & Statistics in Sports

The U.S. baseball season opens Thursday, March 28, and celebrates the 48th season of analytics in baseball, beginning with the founding of the Sabermetric Society in 1971 (the same year that Satchel Paige entered the Hall of Fame). Analytics has come a long way in…

Comments Off on Industry Spotlight: Baseball – Opening Day & Statistics in Sports

#### Jaquard’s coefficient

When variables have binary (yes/no) values, a couple of issues come up when measuring distance or similarity between records.  One of them is the "yacht owner" problem.

#### Darwin’s Legacy in Statistics

Charles Darwin, the most famous grandson of the Enlightenment thinker Erasmus Darwin, published his ground-breaking theory of evolution, “The Origin of Species,”160 years ago. Another grandson of Erasmus, Francis Galton, became one of the founding fathers of statistics (correlation, the “wisdom of the crowd,” regression…

Comments Off on Darwin’s Legacy in Statistics

#### Industry Spotlight: Customer Segmentation

Are you "young and rustic?" Or perhaps a "toolbelt traditionalist?" These are nicknames given to customer segments identified by market research firm Claritas, with its statistical clustering tool. Long before the advent of individualized product recommendations, business sought to segment customers into distinct groups on…

Comments Off on Industry Spotlight: Customer Segmentation

#### Industry Spotlight: CROs

CRO's, or contract research organizations, are a \$40 billion industry, growing at close to 12% per year. They provide contract services to the pharmaceutical industry, including statistical design and analysis, laboratory services, administration of clinical trials, and monitoring of drugs once they are on the…

Comments Off on Industry Spotlight: CROs

#### Handling the Noise – Boost It or Ignore It?

In most statistical modeling or machine learning prediction tasks, there will be cases that can be easily predicted based on their predictor values (signal), as well as cases where predictions are unclear (noise). Two statistical learning methods, boosting and ProfWeight, use those difficult cases in…

Comments Off on Handling the Noise – Boost It or Ignore It?

#### Problem of the Week: Probability

Your country is at war, and an enemy plane has crashed on your territory. It bears the number 60, and a spy has told you that the aircraft are numbered serially. Can you make a guess about the total number of aircraft the enemy has…

Comments Off on Problem of the Week: Probability

#### Rectangular data

Rectangular data are the staple of statistical and machine learning models.  Rectangular data are multivariate cross-sectional data (i.e. not time-series or repeated measure) in which each column is a variable (feature), and each row is a case or record.

#### “Defiant” Supervision

How did the phrase "defiantly recommend", as in "I defiantly recommend this product," come into common usage on the internet? The answer is a good look inside the workings of supervised learning. Supervision, generally from humans, is instrumental in much of statistical and machine learning.…

#### Industry Spotlight: Consulting

When a new technology arrives, consulting companies can quickly add staff and expertise to build institutional capacity centered around the technology in ways companies focused on delivering their own products and services cannot. Large consulting companies like Booz Allen and McKinsey, as well as smaller…

Comments Off on Industry Spotlight: Consulting

#### Good to Great

In 1994, Jim Collins and Jerry Porras, former and current Stanford professors, published the best-seller Built to Last that described how "long-term sustained performance can be engineered into the DNA of an enterprise."  It sold over a million copies. Buoyed by that success, Collins and a…

Comments Off on Good to Great

#### Selection Bias

Selection bias is a sampling or data collection process that yields a biased, or unrepresentative, sample.  It can occur in numerous situations, here are just a few:

#### Space Shuttle Explosion

In 1986, the U.S. space shuttle Challenger exploded several minutes after launch. A later investigation found that the cause of the disaster was O-ring failure, due to cold temperatures. The temperature at launch was 39 degrees, colder than any prior launch. The cold caused the…

Comments Off on Space Shuttle Explosion

People in Alaska are extraordinarily generous - that's what a predictive model showed, when applied to a charitable organization's donor list. A closer examination revealed a flaw - while the original data was for all 50 states, the model's training data for Alaska included donors,…

#### Industry Spotlight – The Military

Abraham Wald, a persecuted Jewish mathematician who fled Austria just before World War II, led an analysis of allied bombers returning from missions. Hitherto, the Air Force had focused on reinforcing areas that showed the most damage on return. Wald convinced them instead to focus…

Comments Off on Industry Spotlight – The Military

#### Why Analytics Projects Fail – 5 Reasons

With the news full of so many successes in the fields of analytics, machine learning and artificial intelligence, it is easy to lose sight of the high failure rate of analytics projects. McKinsey just came out with a report that only 8% of big companies…

Comments Off on Why Analytics Projects Fail – 5 Reasons

#### Political Analytics and Microtargeting

The statistics of targeting individual voters with specific messages, as opposed to messaging that went to whole groups, began in the U.S over a decade ago with the Democrats. Political targeting is now an established business, or at least a discipline within the broader realm…

Comments Off on Political Analytics and Microtargeting

#### The Statistics of Persuasion

The Art of Persuasion is the title of more than one book in the self-help genre, books that have spawned blogs, podcasts, speaking gigs and more. But the science of persuasion is actually of more interest, because it produces useful rules that can be studied…

Comments Off on The Statistics of Persuasion

#### Historical Spotlight – ISOQOL

25 years ago the International Society of Quality of Life Research was founded with a mission to advance the science of quality of life and related patient-centered outcomes in health research, care and policy. While focusing on quality of life (QOL) in healthcare may seem…

Comments Off on Historical Spotlight – ISOQOL

#### Book Review: Thinking Fast and Slow

Daniel Kahneman won a Nobel Prize in Economics for his work in behavioral economics, much of it with his colleague Amos Tversky, who died in 2006. Kahneman's 2011 classic, Thinking Fast and Slow, is a superbly-written non-technical summary of their fascinating research and its often…

Comments Off on Book Review: Thinking Fast and Slow

#### Likert Scale

A "likert scale" is used in self-report rating surveys to allow users to express an opinion or assessment of something on a gradient scale.  For example, a response could range from "agree strongly" through "agree somewhat" and "disagree somewhat" on to "disagree strongly."  Two key decisions the survey designer faces are

• How many gradients to allow, and

• Whether to include a neutral midpoint

#### Football Analytics

Preparing for the Superbowl Your team is at midfield, you have the ball, it's 4th down with 2 yards to go. Should you go for it? (Apologies in advance to our many readers, especially those outside the U.S., who are not aficionados of American football,…

#### Job Spotlight: Digital Marketer

A digital marketer handles a variety of tasks in online marketing - managing online advertising and search engine optimization (SEO), implementing tracking systems (e.g. to identify how a person came to a retailer), web development, preparing creatives, implementing tests, and, of course, analytics. There are…

Comments Off on Job Spotlight: Digital Marketer

#### Dummy Variable

A dummy variable is a binary (0/1) variable created to indicate whether a case belongs to a particular category.  Typically a dummy variable will be derived from a multi-category variable. For example, an insurance policy might be residential, commercial or automotive, and there would be three dummy variables created:

#### Things are Getting Better

In the visualization below, which line do you think represents the UN's forecast for the number of children in the world in the year 2100? Hans Rosling, in his book Factfulness, presents this chart and notes that in a sample of Norwegian teachers, only 9%…

Comments Off on Things are Getting Better

#### Artificial Lawyers

Can statistical and machine learning methods replace lawyers? A host of entrepreneurs think so, and do the folks who run www.artificiallawyer.com. Text mining and predictive model products are available now to predict case staffing requirements and perform automated document discovery, and natural language algorithms conduct…

#### Entity Resolution and Identifying Bad Guys

Earlier, we described how Jen Golbeck (who teaches Network Analysis at Statistics.com) analyzed Facebook connections to identify fake accounts (the account holders friends all had the same number of friends, which is highly improbable statistically). Network analysis and studying connections lie at the heart of…

#### Work and Heat

If you are working on New Year's Eve or New Year's Day, odds are it is from home, where you can (usually) control the temperature in the home. Which, from the standpoint of productivity, is a good thing. According to a study from Cornell, raising…

Comments Off on Work and Heat

#### Curbstoning

Curbstoning, to an established auto dealer, is the practice of unlicensed car dealers selling cars from streetside, where the cars may be parked along the curb.  With a pretense of being an individual selling a car on his or her own, and with no fixed…

#### Snowball Sampling

Snowball sampling is a form of sampling in which the selection of new sample subjects is suggested by prior subjects.  From a statistical perspective, the method is prone to high variance and bias, compared to random sampling. The characteristics of the initial subject may propagate through the sample to some degree, and a sample derived by starting with subject 1 may differ from that produced by by starting with subject 2, even if the resulting sample in both cases contains both subject 1 and subject 2.  However, …

#### The Statistics of Christmas Trees

A researcher shakes a sprig from a Christmas tree, and counts the number of needles that fall. He then repeats the process for countless other sprigs. The sprigs are from a variety of species, and the goal is to determine which species do the best…

Comments Off on The Statistics of Christmas Trees

#### The False Alarm Conundrum

False alarms are one of the most poorly understood problems in applied statistics and biostatistics. The fundamental problem is the wide application of a statistical or diagnostic test in search of something that is relatively rare. Consider the Apple Watch's new feature that detects atrial…

Comments Off on The False Alarm Conundrum

#### Conditional Probability Word of the Week

QUESTION:  The rate of residential insurance fraud is 10% (one out of ten claims is fraudulent).  A consultant has proposed a machine learning system to review claims and classify them as fraud or no-fraud.  The system is 90% effective in detecting the fraudulent claims, but only 80% effective in correctly classifying the non-fraud claims (it mistakenly labels one in five as "fraud").  If the system classifies a claim as fraudulent, what is the probability that it really is fraudulent?

Comments Off on Conditional Probability Word of the Week

#### Instructor Spotlight – David Kleinbaum

David Kleinbaum developed several courses for Statistics.com, including Survival Analysis, Epidemiologic Statistics, and Designing Valid Statistical Studies. David retired a little over a year ago from Emory University, where he was a popular and effective teacher with the ability to distill and explain difficult statistical…

Comments Off on Instructor Spotlight – David Kleinbaum

#### Book Review: Active-Epi

ActivEpi Web, by David Kleinbaum, is the text used in two Statistics.com courses (Epidemiology Statistics and Designing Valid Studies), but it is really a rich multimedia web-based presentation of epidemiological statistics, serving the role of a unique textbook format for an introductory course in the…

Comments Off on Book Review: Active-Epi

#### Churn

Churn is a term used in marketing to refer to the departure, over time, of customers.  Subscribers to a service may remain for a long time (the ideal customer), or they may leave for a variety of reasons (switching to a competitor, dissatisfaction, credit card expires, customer moves, etc.).  A customer who leaves, for whatever reason, "churns."

This weekend (12/8/2018) marked the 253rd anniversary of the birth of Eli Whitney, inventor of the cotton gin. And 20 years ago, Google received its first big infusions of capital from, among others, Jeff Bezos, the founder of Amazon. Both Eli Whitney and the Google…

#### Survival Analysis

Convinced that he, like his father, would die in his 40's, Winston Churchill lived his early life in a frenetic hurry. He had participated in four wars on three continents by his mid-20's, served in multiple ministerial positions by his 30's, and published 12 books…

A classic machine learning task is to predict something's class, usually binary - pictures as dogs or cats, insurance claims as fraud or not, etc. Often the goal is not a final classification, but an estimate of the probability of belonging to a class (propensity),…

#### Job Spotlight: Data Scientist

Data science is one of a host of similar terms. Artificial intelligence has been around since the 1960's and data mining for at least a couple of decades. Machine learning came out of the computer science community, and analytics, data analytics, and predictive analytics came…

Comments Off on Job Spotlight: Data Scientist

#### ROC Curve

The Receiver Operating Characteristics (ROC) curve is a measure of how well a statistical or machine learning model (or a medical diagnostic procedure) can distinguish between two classes, say 1’s and 0’s.  For example, fraudulent insurance claims (1’s) and non-fraudulent ones (0’s). It plots two quantities:

#### Triage and Artificial Intelligence

Predictim is a service that scans potential babysitters' social media and other online activity and issues them a score that parents can use to select babysitters. Jeff Chester, the executive director of the Center for Digital Democracy, commented: There's a mad rush to seize the…

Comments Off on Triage and Artificial Intelligence

#### Deming’s Funnel Problem

W. Edwards Deming's funnel problem is one of statistics' greatest hits. Deming was a noted statistician who took the statistical process control methods of Shewhart and expanded them into a holistic approach to manufacturing quality. Initially, his ideas were cooly received in the US and…

Comments Off on Deming’s Funnel Problem

#### Industry Spotlight: the Auto Industry

The auto industry serves as a perfect exemplar of three key eras of statistics and data science in service of industry: Total Quality Management (TQM) First in Japan, and later in the U.S., the auto industry became an enthusiastic adherent to the Total Quality Management…

Comments Off on Industry Spotlight: the Auto Industry

#### Analytics Professionals – Must They Be Good Communicators?

Most job ads in the technical arena list communication among the sought-after skills; it consistently outranks many programming and analytical skills. Is it for real, or is it just thrown in there by the HR Department on general principle? The founder of a leading analytics…

Comments Off on Analytics Professionals – Must They Be Good Communicators?

#### Prospective vs. Retrospective

A prospective study is one that identifies a scientific (usually medical) problem to be studied, specifies a study design protocol (e.g. what you’re measuring, who you’re measuring, how many subjects, etc.), and then gathers data in the future in accordance with the design. The definition…

Comments Off on Prospective vs. Retrospective

#### The Evolution of Clinical Trials

Boiling oil versus egg yolks One early clinical trial was accidental. In the 16th century, a common treatment for wounded soldiers was to pour boiling oil on their wounds. In 1537, the surgeon Ambroise Pare, attending French soldiers, ran out of oil one evening. He…

Comments Off on The Evolution of Clinical Trials

#### Random Selection for Harvard Admission?

An ethical algorithm... Ethics in algorithms is a popular topic now. Usually the conversation centers around the possible unintentional bias or harm that a statistical or machine learning algorithm could do when it is used to select, score, rate, or rank people. For example -…

#### GE Regresses to the Mean

Thirty years ago, GE became the brightest star in the firmament of statistical ideas in business when it adopted Six Sigma methods of quality improvement. Those methods had been introduced by Motorola, but Jack Welch's embrace of the same methods at GE, a diverse manufacturing…

Comments Off on GE Regresses to the Mean

In a couple of days, theWall Street Journalwill come out with its November survey of economists' forecasts. It's a particularly sensitive time, with elections in a few days and President Trump attacking the Federal Reserve for for raising interest rates. It's a good time to…