• “The goal is to turn data into information, and information into insight.”
    – Carly Fiorina, former CEO, Hewlett-Packard Co.
  • “Data is the new science. Big data holds the answers.”
    – Pat Gelsinger, CEO, EMC
  • “Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world.”
    – Atul Butte, Stanford
  • “You can have data without information, but you cannot have information without data.”
    – Daniel Keys Moran, Programmer and Author
  • “Information is the oil of the 21st century, and analytics is the combustion engine.”
    – Peter Sondergaard, SVP, Gartner Research
  • I keep saying that the sexy job in the next 10 years will be statisticians. And I’m not kidding.”
    – Hal Varian, Chief Economist, Google
  • "Data scientist is just a sexed up word for statistician."
    - Nate Silver, editor-in-chief of ESPN's FiveThirtyEight blog and a Special Correspondent for ABC News.
  • "Data scientists are statisticians because being a statistician is awesome and anyone who does cool things with data is a statistician."
    - Robert Rodriguez, President, American Statistical Association
  • "Data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others."
    - Mike Loukides, VP, O’Reilly Media
  • “Think analytically, rigorously, and systematically about a business problem and come up with a solution that leverages the available data.”
    - Michael O'Connell, Sr. Director of Analytics, TIBCO
  • "By definition all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights."
    -Monica Rogati,  VP for Data, Jawbone
  • "A data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning. Data scientists not only are adept at working with data, but appreciate data itself as a first-class product."
    - Hillary Mason, Data Scientist, Accel, Scientist Emeritus, bitly, co-founder, HackNY
  • "A significant constraint on realizing value from Big Data will be a shortage of talent, particularly of people with deep expertise in statistics and machine learning, and the managers and analysts who know how to operate companies by using insights from Big Data."
    2011 McKinsey report (see resources below)
  • "We project a need for 1.5 million additional managers and analysts in the United States who can ask the right questions and consume the results of the analysis of Big Data effectively."
    2011 McKinsey report (see resources below)
  • "The data scientist was called, only half-jokingly, 'a caped superhero.'"
    Ben Rooney, Wall Street Journal
  • "By 2018 the United States will experience a shortage of 190,000 skilled data scientists, and 1.5 million managers and analysts capable of reaping actionable insights from the big data deluge."
    - 2013 McKinsey report (see resources below)
  • "Autodidacts – the self-taught, uncredentialed, data-passionate people – will come to play a significant role in many organizations’ data science initiatives."
    Neil Raden, CEO & Principal Analyst, Hired Brains Research
  • “Statistics are ubiquitous in life, and so should be statistical reasoning.”
    Alan Blinder, former Federal Reserve vice chairman and Princeton academic
Looking to accelerate your career or help upgrade the skills of your team members?
  • Learn from top experts (our instructors have authored hundreds of books).
  • Full curriculum of courses in analytics and programming
  • Have your questions answered directly by these experts on small private discussion forums.
  • Work with instructors from top universities, and industry experts with experience at Google, eBay, Samsung, CoBrain etc.
  • Work with real problems, real data and multiple software tools.
  • Receive individual feedback on your projects and exercises.

  • Videos
    • Short Papers and Videos at Statistics.com
    • PBS "Making Stuff Faster", this video discusses Operations research and optimization. See Jack Levin from UPS at 23:56. Traveling Salesman Problem (TSP) is explained, and later, Monte Carlo (MCMC) maneuver regarding boarding planes.
    • Unconference on the Future of Statistics, hosted by Jeff Leek and Roger Peng with the following speakers: Daniela Witten, Assistant Professor, Department of Biostatistics, University of Washington, Hongkai Ji, Assistant Professor, Department of Biostatistics, Johns Hopkins University, Joe Blitzstein, Professor of the Practice, Department of Statistics, Harvard University, Sinan Aral, Associate Professor, MIT Sloan School of Management, Hadley Wickham, Chief Scientist, RStudio, and Hilary Mason, Chief Data Scientist at Accel Partners.
    • DataGotham 2013

Data analytics and data science are popular terms, and skills in these areas are in great demand.  But what do these terms mean?  Below is an overview and a listing of related courses. For information about our certificate program in data analytics, click here.

Data Prep

It is a truism that most of the work in data mining is not in algorithm specification, application and interpretation.  It is in extracting, cleaning and preparing data.  Learn how to extract data from a relational database using SQL, and merge it into a single file in R, so that you can perform statistical operations.

Predictive Modeling and Forecasting

In predictive modeling (also called predictive analytics) we seek to predict the value of a variable of interest (purchase/no purchase, fraudulent/not fraudulent, malignant/benign, amount of spending, etc.) by using "training" data where the value of this variable is known.  Once a statistical model is built with the training data ("trained"), it is then applied to data where the value is unknown.  Predictive modeling is also termed "supervised learning" and is covered in the following courses:

Applied Predictive Analytics incorporates a Kaggle-like predictive modeling contest in which participants build and submit models, which are then assessed against a hold-out data set in a course-long contest.


Recommender Systems

The purpose of a recommender system is to identify, statistically, "what goes with what."  These systems lie behind the notices you see on web sites advising you that "customers who bought X also bought Y."  The general statistical terms for the methods used are affinity analysis and association rules; these are unsupervised methods.


In clustering, we seek to identify groups of customers, records, etc. that are similar to one another.  "Clustering" is the general statistical technique; when we apply it to customers it is the statistical component in customer segmentation.  Clustering is an "unsupervised" data mining method - there is no known outcome that serves to train a model.

Text Analytics & Social Network Analysis

The most rapid data growth is not in numerical data, but in text - Twitter feeds, the contents of Facebook pages, emails, etc. - which must be pre-processed to be usable.  Learn more:

Tools to Use in Data Analytics


Graphical visualization techniques are important ways to explore data, gain insight, and deal with the complexity of big data.