Skip to content

July 28: Statistics in Practice

Statistics in Practice

In this week’s brief we discuss outliers and anomalies, the unusual cases and events that often end up being the focus of attention. Our course spotlight is

If you’re interested in this topic, you should also consider the that it is part of. The methods and tools of anomaly detection are a collection of various data science techniques and you’ll get the most out of the Anomaly Detection course if you’ve had a broader preparation.

See you in class!

Peter Bruce

Founder, Author, and Senior Scientist

Statistics Logo

When Outliers are Central

In the wake of the 2008 recession, the government reported that average wages for all workers declined $384 from 2008 to 2009. Later, it amended this report to say the drop was really $598, due to errors in some tax returns. In a workforce of more than 140 million workers, can you guess how many taxpayers were responsible for this 56% “error?” Hint – you can count them on one []

Word of the Week:


In text mining, the term “bag-of-words” refers to the approach in which you consider a document simply as a collection of disconnected words. The analysis that follows does not rely on making sense of phrases or sentences, it simply makes use of the matrix of word occurrence frequency. The goal is classification or clustering of large numbers of documents using predictive models or clustering algorithms. Obviously, the bag-of-words technique cannot be used to interpret or make sense of a single document. However, in looking at numerous documents it can detect concepts (or topics) that involve multiple words.

Course Spotlight

Our spotlight this week is on the Anomaly Detection course, and the Programming For Data Science certificate programs of which it is part. You can take the course on its own, or as part of the 10-course certificate.

Nov 6 – Dec 4: Anomaly Detection

  • Use a supervised classification technique for anomaly detection, and understand the limits of supervised learning for anomaly detection
  • Apply a nearest-neighbor algorithm, and other unsupervised methods, for identifying anomalies in the absence of labels
  • Practice applying the various techniques to different problems in different domains
  • Assess which methods among a diverse set work best in a given situation

See you in class!

Certificate Spotlight

Programming for Data Science

The is your ticket into the world of data science and analytics. In this 18-month program (more or less, depending on your schedule), you’ll learn Python programming or R programming (or both, if you are ambitious), and learn how to

  • Read, understand, modify, and create basic functions
  • Manipulate data programmatically for data analytics and data mining
  • Extract data from a relational database using SQL, and merge it into a single file
  • Extract, clean, prepare, and mine data
  • Understand and implement predictive models: classification and prediction
  • Understand and implement unsupervised techniques such as clustering and recommender systems

A number of case-study projects are included and you can assemble a portfolio of your work. We have different program flavors, depending on your experience and your preference for R or Python. Intro stats is a prerequisite, but if you need it we’ll provide that course free of charge. We offer rolling admissions year-round – read more here.

See you in class!

Digital Badges

Anomaly Detection

Digital badges provide employers and peers concrete evidence of what you have learned and the skills required to earn your credential. Each badge’s digital image holds verified metadata describing your qualifications and the mastery required to earn them.

Contact Us To Learn More

If you have any questions on our courses, certificates, and degree programs and how they can apply to you, your work, and to your career, please get in touch. We’re here to help you succeed.