Predictive Analytics 1 - Machine Learning Tools - with R

Predictive Analytics 1 - Machine Learning Tools - with R

taught by Inbal Yahav and Kuber Deokar


Close Popup

Aim of Course:

In this online course, “Predictive Analytics 1 - Machine Learning Tools - with R,” you will be introduced to the basic concepts in predictive analytics, also called predictive modeling, the most prevalent form of data mining. This course covers the two core paradigms that account for most business applications of predictive modeling: classification and prediction. In both cases, predictive modeling takes data where a variable of interest (a target variable) is known and develops a model that relates this variable to a series of predictor variables, also called features. In classification, the target variable is categorical ("purchased something" vs. "has not purchased anything"). In prediction, the target variable is continuous ("dollars spent"). You will learn how to explore and vizualize the data, to get a preliminary idea of what variables are important, and how they relate to one another.  Four machine learning techniques will be used: k-nearest neighbors, classification and regression trees (CART), and Bayesian classifiers. Then you will learn how to combine different models to obtain results that are better than any of the individual models produce on their own.  The course will also cover the use of partitioning to divide the data into training data (data used to build a model), validation data (data used to assess the performance of different models, or, in some cases, to fine tune the model) and test data (data used to predict the performance of the final model). The course includes hands-on work with R, a free software environment for statistical computing. 

SOFTWARE:  This course also offers:

  • An XLMiner section (data mining add-in for Excel)
  • Python section

Anticipated learning outcomes:

  • Visualize and explore data to better understand relationships among variables
  • Partition data to provide an assessment basis for predictive models
  • Choose and implement appropriate performance measures for predictive models
  • Specify and implement models with the following algorithms
  • k-nearest-neighbor
  • Naive Bayes
  • Classification and Regression Trees
  • Understand how ensemble models improve predictions


Take a 10-question quiz on analytics

This course may be taken individually (one-off) or as part of a certificate program.
Course Program:

WEEK 1: Preparation

  • What is supervised learning
  • Data partitioning and holdout samples
  • Choosing variables (features)
  • Handling missing data
  • Visualization and exploration

WEEK 2: Classification and Prediction

  • Assessing classification models
    • Confusion matrix
    • Misclassification costs
    • Lift
  • Assessing prediction models
    • Common metrics
  • K-Nearest-Neighbors (KNN)
    • Measuring distance
    • Choosing k
    • Generating classifications and predictions

WEEK 3: Bayesian Classifiers; CART

  • Full Bayes classifier
  • Naive Bayes classifier
  • Classification and Regression Trees (CART)
    • Growing the tree
    • Avoiding overfit - pruning
    • Using trees for classifications and predictions

WEEK 4: Ensembles

  • Combine multiple algorithms
  • Improve results


Homework in this course consists of short answer questions to test concepts, guided data analysis problems using software, and end of course data modeling project.  Note: There will be a mid-week discussion exercise in the first week of the course.

In addition to assigned readings, this course also has supplemental video lectures, and an end of course data modeling project.


Sample Video By Dr. Shmueli

Predictive Analytics 1 - Machine Learning Tools - with R

Who Should Take This Course:
Marketing and IT managers, financial analysts and risk managers, accountants, data analysts, data scientists, forecasters.  This course is especially useful if you want to understand what predictive modeling might do for your organization, undertake pilots with minimum setup costs, manage predictive modeling projects, or work with consultants or technical experts involved with ongoing predictive modeling deployments.
Introductory / Intermediate

 You will benefit from some familiarity with regression, which is covered in's Statistics 2.

You should be familiar with R.

Organization of the Course:

Options for Credit and Recognition:

Course Text:

The required text for this course is Data Mining for Business Analytics: Concepts, Techniques, and Applications in R by Shmueli, Patel, Yahav, Bruce and Lichtendahl. This same text is also used in the follow on courses: "Predictive Analytics 2 - Neural Nets and Regression - with R" and "Predictive Analytics 3 - Dimension Reduction, Clustering and Association Rules - with R"



This is a hands-on course, and participants will apply data mining algorithms to real data.  The course will use R, a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

This course also offers:

  • An XLMiner section (data mining add-in for Excel)
  • Python section


May 10, 2019 to June 07, 2019 September 06, 2019 to October 04, 2019 January 17, 2020 to February 14, 2020 May 22, 2020 to June 19, 2020

Predictive Analytics 1 - Machine Learning Tools - with R


May 10, 2019 to June 07, 2019 September 06, 2019 to October 04, 2019 January 17, 2020 to February 14, 2020 May 22, 2020 to June 19, 2020

Course Fee: $549

Do you meet course prerequisites? What about book & software? (Click here to learn more)

We have flexible policies to transfer to another course, or withdraw if necessary (modest fee applies)

Group rates: Click here to get information on group rates. 

First time student or academic? Click here for an introductory offer on select courses. Academic affiliation?  You may be eligible for a discount at checkout.


Register Now

Add $50 service fee if you require a prior invoice, or if you need to submit a purchase order or voucher, pay by wire transfer or EFT, or refund and reprocess a prior payment. Please use this printed registration form, for these and other special orders.

Courses may fill up at any time and registrations are processed in the order in which they are received. Your registration will be confirmed for the first available course date, unless you specify otherwise.

The Institute for Statistics Education is certified to operate by the State Council of Higher Education in Virginia (SCHEV).

Contact Us
Have a question about a course before you register? Call us. We're here for you. (571) 281-8817 or ourcourses (at)

Want to be notified of future courses?