Skip to content

Data Analytics Courses

Data analytics and data science are popular terms, and skills in these areas are in great demand.  But what do these terms mean?  Below is an overview and a listing of related courses. For information about our certificate programs in data science and analytics, click here. →Test Yourself Take a 10-question quiz on analytics Data PrepContinue reading “Data Analytics Courses”

Healthcare Analytics: Exploration versus Confirmation

Perhaps the most active application of analytics and data mining is healthcare. This week we look at one success story, the use of machine learning to predict diabetic retinopathy, one story of disappointment, the use of genetic testing in a puzzling disease, and a basic dichotomy in statistical analysis. In his famous 1977 book thatContinue reading “Healthcare Analytics: Exploration versus Confirmation”

Matching Algorithms

Some applications of machine learning and artificial intelligence are recognizably impressive – predicting future hospital readmission of discharged patients, for example, or diagnosing retinopathy. Others – self-driving cars, for example – seem almost magical. The matching problem, though, is one where your first reaction might be “What’s so hard about that?” For example, to takeContinue reading “Matching Algorithms”

Job Spotlight: Data Scientist

Data science is one of a host of similar terms.  “Artificial intelligence” has been around since the 1960’s and “data mining” for at least a couple of decades.  “Machine learning” came out of the computer science community, and “analytics,” “data analytics,” and “predictive analytics” came out of the statistics and OR communities.  Among all ofContinue reading “Job Spotlight: Data Scientist”

Industry Spotlight: Automotive

The auto industry serves as a perfect exemplar of three key eras of statistics and data science in service of industry: Total Quality Management (TQM) First in Japan, and later in the U.S., the auto industry became an enthusiastic adherent to the Total Quality Management philosophy.  Fundamentally, TQM is all about using data to improveContinue reading “Industry Spotlight: Automotive”

Feature Engineering and Data Prep – Still Needed?

It is a truism of machine learning and predictive analytics that 80% of an analyst’s time is consumed in cleaning and preparing the needed data. I saw an estimate by a Google engineer that 25% of the time was spent just looking for the right data. A big part of this process is human-driven featureContinue reading “Feature Engineering and Data Prep – Still Needed?”

Book Review: Weapons of Math Destruction

Cathy O’Neil’s Weapons of Math Destruction, when it was first published in 2016, sounded an early alarm about the big data algorithms and their potential for social evil. The cover is adorned with a robotic death’s head and the subtitle reads “How Big Data Increases Inequality and Threatens Democracy.” O’Neil’s book begins with stories thatContinue reading “Book Review: Weapons of Math Destruction”

Confusing Terms in Data Science – A Look at Synonyms, Homonyms and more

To a statistician, a sample is a collection of observations (cases). To a machine learner, it’s a single observation. Modern data science has its origin in several different fields, which leads to potentially confusing homonyms and synonyms, like these: Homonyms (words with multiple meanings): Bias: To a lay person, bias refers to an opinion about somethingContinue reading “Confusing Terms in Data Science – A Look at Synonyms, Homonyms and more”

Industry Spotlight: Package Delivery

Nothing better illustrates the encroachment of data science and analytics on the older “economy of tangible things” than the business of delivering packages. The use of analytics in package delivery is not new. Companies like UPS and Fedex are longtime users of operations research methods like optimization and simulation to route inter-city shipments, site newContinue reading “Industry Spotlight: Package Delivery”

Ethical Practice in Data Mining

Prior to the advent of internet-connected devices, the largest source of big data was public interaction on the internet. Social media users, as well as shoppers and searchers on the internet, make an implicit deal with the big companies that provide these services: users can take advantage of powerful search, shopping and social interaction toolsContinue reading “Ethical Practice in Data Mining”

“Defiant” Supervision

How did the phrase “defiantly recommend”, as in “I defiantly recommend this product,” come into common usage on the internet? The answer is a good look inside the workings of supervised learning. Supervision, generally from humans, is instrumental in much of statistical and machine learning. Google’s precise search algorithms are not public, but the generalContinue reading ““Defiant” Supervision”

Alaskan Generosity

People in Alaska are extraordinarily generous – that’s what a predictive model showed, when applied to a charitable organization’s donor list. A closer examination revealed a flaw – while the original data was for all 50 states, the model’s training data for Alaska included donors, but excluded non-donors. The reason? The data was 99% non-donors,Continue reading “Alaskan Generosity”

Political Analytics and Microtargeting

The statistics of targeting individual voters with specific messages, as opposed to messaging that went to whole groups, began in the U.S over a decade ago with the Democrats. Political targeting is now an established business, or at least a discipline within the broader realm of political consulting. By 2016, the Republicans had surged wellContinue reading “Political Analytics and Microtargeting”

The Statistics of Persuasion

The Art of Persuasion is the title of more than one book in the self-help genre, books that have spawned blogs, podcasts, speaking gigs and more. But the science of persuasion is actually of more interest, because it produces useful rules that can be studied and deployed. Marketers and politicians have long been enthusiastic usersContinue reading “The Statistics of Persuasion”

Job Spotlight: Digital Marketer

A digital marketer handles a variety of tasks in online marketing – managing online advertising and search engine optimization (SEO), implementing tracking systems (e.g. to identify how a person came to a retailer), web development, preparing creatives, implementing tests, and, of course, analytics. There are typically three types of employers: Marketing agencies that contract outContinue reading “Job Spotlight: Digital Marketer”

Artificial Lawyers

Can statistical and machine learning methods replace lawyers? A host of entrepreneurs think so, and do the folks who run www.artificiallawyer.com. Text mining and predictive model products are available now to predict case staffing requirements and perform automated document discovery, and natural language algorithms conduct legal research and case review. In 2017, a predictive algorithmContinue reading “Artificial Lawyers”

Entity Resolution and Identifying Bad Guys

Earlier, we described how Jen Golbeck (who teaches Network Analysis at Statistics.com) analyzed Facebook connections to identify fake accounts (the account holders friends all had the same number of friends, which is highly improbable statistically). Network analysis and studying connections lie at the heart of entity resolution. To a sales and marketing person, entity resolutionContinue reading “Entity Resolution and Identifying Bad Guys”

How Google Determines Which Ads you See

A classic machine learning task is to predict something’s class, usually binary – pictures as dogs or cats, insurance claims as fraud or not, etc. Often the goal is not a final classification, but an estimate of the probability of belonging to a class (propensity), so the cases can be ranked. A good example ofContinue reading “How Google Determines Which Ads you See”

Job Spotlight: Data Scientist

Data science is one of a host of similar terms. Artificial intelligence has been around since the 1960’s and data mining for at least a couple of decades. Machine learning came out of the computer science community, and analytics, data analytics, and predictive analytics came out of the statistics and OR communities. Among all ofContinue reading “Job Spotlight: Data Scientist”

Triage and Artificial Intelligence

Predictim is a service that scans potential babysitters’ social media and other online activity and issues them a score that parents can use to select babysitters. Jeff Chester, the executive director of the Center for Digital Democracy, commented: There’s a mad rush to seize the power of AI to make all kinds of decisions withoutContinue reading “Triage and Artificial Intelligence”