Data Analytics Courses
Data analytics and data science are popular terms, and skills in these areas are in great demand. But what do these terms mean? Below is an overview and a listing of related courses. For information about our certificate program in data analytics, click here.
It is a truism that most of the work in data mining is not in algorithm specification, application and interpretation. It is in extracting, cleaning and preparing data. Learn how to extract data from a relational database using SQL, and merge it into a single file in R, so that you can perform statistical operations.
Predictive Modeling and Forecasting
In predictive modeling (also called predictive analytics) we seek to predict the value of a variable of interest (purchase/no purchase, fraudulent/not fraudulent, malignant/benign, amount of spending, etc.) by using "training" data where the value of this variable is known. Once a statistical model is built with the training data ("trained"), it is then applied to data where the value is unknown. Predictive modeling is also termed "supervised learning" and is covered in the following courses:
- Predictive Analytics 1
- Predictive Analytics 2
- Predictive Analytics 3
- Data Mining in R
- Statistical Analysis of Microarray Data in R
- Forecasting Analytics
- Applied Predictive Analytics
Applied Predictive Analytics incorporates a Kaggle-like predictive modeling contest in which participants build and submit models, which are then assessed against a hold-out data set in a course-long contest.
The purpose of a recommender system is to identify, statistically, "what goes with what." These systems lie behind the notices you see on web sites advising you that "customers who bought X also bought Y." The general statistical terms for the methods used are affinity analysis and association rules; these are unsupervised methods.
In clustering, we seek to identify groups of customers, records, etc. that are similar to one another. "Clustering" is the general statistical technique; when we apply it to customers it is the statistical component in customer segmentation. Clustering is an "unsupervised" data mining method - there is no known outcome that serves to train a model.
Text Analytics & Social Network Analysis
The most rapid data growth is not in numerical data, but in text - Twitter feeds, the contents of Facebook pages, emails, etc. - which must be pre-processed to be usable. Learn more:
- Text Mining
- Natural Language Processing
- Sentiment Analysis
- Social Network Analysis
- Social Network Analysis - Python
Tools to Use in Data Analytics
- Introduction to Python for Analytics
- Introduction to Analytics using Hadoop and R
- R Programming Intro 1
- R Programming Intro 2
Graphical visualization techniques are important ways to explore data, gain insight, and deal with the complexity of big data.