logo.gif The leading source for professional development COURSES in statistics
Course Login
Home > Our Courses >



Introduction to Data Mining

Dr. Anthony Babinec

Aim of Course:

This course will introduce you to the basic concepts in data mining. Data mining, the art and science of learning from data, covers a number of different procedures. This course covers the two core paradigms that account for most business applications of data mining: classification and prediction. In both cases, data mining takes data where a variable of interest is known and develops a model that relates this variable to a series of predictor variables. In classification, the variable of interest is categorical ("purchased something" vs. "has not purchased anything"). In prediction, the variable of interest is continuous ("dollars spent"). Four techniques will be used: k-nearest neighbors, classification and regression trees (CART), logistic regression and multiple linear regression. The course will also cover the use of partitioning to divide the data into training data (data used to build a model), validation data (data used to assess the performance of different models, or, in some cases, to fine tune the model) and test data (data used to predict the performance of the final model).

Who Should Take This Course:

Analysts of business data, consultants, MBAs seeking to update their knowledge of quantitative techniques, managers who want to see what data-mining can do, and anyone who wants a practical hands-on grounding in basic data-mining techniques.

For those enrolled in Professional Advancement Programs, this is a required or elective course in the following Programs:

  • Statistics in Business & Marketing - elective
  • Data Mining - required

Course Program:

The course is structured as follows

SESSION 1: Introduction
  • Core ideas in data mining
  • supervised and unsupervised learning
  • The steps in data mining
  • SEMMA
  • Preliminary steps
    • Sampling from a database
    • Pre-processing and cleaning the data
    • Partitioning the data
  • Building a model
    • An example with linear regression
  • K-nearest neighbor
SESSION 2: Classification
  • Judging the performance of classification algorithms
  • Classification trees
  • Logistic regression
  • Lift
SESSION 3: Prediction
  • Multiple linear regression
  • Regression trees
SESSION 4: Neural nets
  • Neural nets
  • Comparing different models

The Instructor:

Dr. Anthony Babinec, President of AB Analytics. For over two decades, Tony Babinec has specialized in the application of statistical and data mining methods to the solution of business problems. Tony has multiple degrees fromthe University of Chicago, where he studied Advanced Statistics and Survey Research. Before forming AB Analytics, Babinec was Director of Director of Business Development and Director of Advanced Products Marketing at SPSS; he worked on the marketing of Clementine and introduced CHAID, neural nets and other advanced technologies to SPSS. He has presented at the AMA's Applied Research Methods Conference, the AMA's ART Forum, Henry Stewart Conferences, the Sawtooth Software Conference, Statistical Innovation's Statistical Modeling Week, and numerous professional meetings. He is on the Board of Directors of the Chicago Chapter of the American Statistical Association, where he has held various offices including President. He is on the Editorial Board of the Journal of Targeting, Measurement and Analysis for Marketing.

Organization of the Course:

The course takes place over the internet, at statistics.com. During each course week, you participate at times of your own choosing - there are no set times when you must be online. Course participants will be given access to a private discussion board. In class discussions led by the instructor, you can post questions, seek clarification, and interact with your fellow students and the instructor. The course is scheduled to take place over 4 weeks, and typically requires 10-15 hours per week. At the beginning of each week, you receive the relevant material, in addition to answers to exercises from the previous session. During the week, you are expected to go over the course materials and work through exercises. Discussion among participants is encouraged. The instructor will provide answers and comments.

Certificates and Grades:

You may be interested only in learning the material presented, and not be concerned with grades or certificates. Or you may be enrolled in a statistics.com Professional Advancement Program that requires demonstration of proficiency in the subject, in which case your work will be assessed for purposes of issuing a grade. Or you may require only a "Certificate of Course Completion," along with professional development credit in the form of Continuing Education Units (CEU's). As you begin the class, you will be asked to specify your category.

Credit:

This course offers continuing education units (CEU's). For those successfully completing the course (generally this means marks of 50% or better on the homework), 5.0 CEU's and a certificate will be issued by statistics.com, upon request.

Dates:

Sep. 12 - Oct. 10, 2008
Click here to be notified of future course offerings.

Participants gain access to the online materials on the first day of the course, and typically spend about 10-15 hours per week (at their convenience). You retain full access to course materials, including discussion board, for two weeks after the course closing date.

Level:

Novice/Intermediate

Prerequisite:

The equivalent of Introduction to Statistics I: Inference for a Single Variable, and Introduction to Statistics II: Working with Bivariate Data (and, if necessary before these courses, Introduction to Statistics for Beginners or Survey of Statistics for Beginners). Participants should also be familiar with multiple linear regression. For additional information about course prerequisites, click here.

Course Text:

The text is Data Mining for Business Intelligence by Shmueli, Patel and Bruce, from Wiley. The text can be ordered directly from Wiley using the previous link. Wiley usually offers a 15% discount when a statistics.com customer orders using the previous link. A six-month license to XLMiner comes bundled with the course text.

Software:

This is a hands-on course. Participants will apply data mining algorithms to real data, and interpret the results. Any software capable of handling the routines covered may be used. XLMiner, a data-mining add-in for Excel, will be illustrated in the course and its use and output will be explained. A six-month license to XLMiner comes bundled with the course text.

Registration:

Register Online - $449
Register Online (academic) - $349 (you must be affiliated with a college, university or high school)

Add $50 service fee if you require a prior invoice, or if you need to submit a purchase order or voucher, pay by wire transfer or EFT, or refund and reprocess a prior payment. Please use this printed registration form, for these and other special orders.

Note: Courses may fill up at any time and registrations are processed in the order in which they are received. Your registration will be confirmed for the first available course date, unless you specify otherwise.