## Flexible, affordable statistics education.

Designed to help you master the software you need to enhance your skills and the practical experience you need to get ahead.

# Data Mining: Unsupervised Techniques

Instructor(s):

Dates:

October 11, 2013 to November 08, 2013 October 10, 2014 to November 07, 2014

Thank you for your submission.

# Data Mining: Unsupervised Techniquestaught by Tony Babinec

Aim of Course:

Data mining, the art and science of learning from data, covers a number of different procedures. This course covers key unsupervised learning techniques: association rules, principal components analysis, and clustering. (Introduction to Predictive Modeling covers techniques that are used to predict a record's class, or the value of an outcome variable on the basis of a set of records with known outcomes). The course will include an integration of supervised and unsupervised learning techniques.

This is a hands-on course -- participants in the course will have access to an Excel-based comprehensive tool for data-mining, XLMiner, the use of which will be explained in the course. Participants will apply data mining algorithms to real data, and will interpret the results.

An online bulletin board available enables you to interact with the instructor and your fellow students throughout the course and submit your own findings for discussion. The course should take about 15 hours per week. Regular visits to the course discussion board are required, but you can arrange these at your own convenience. (Follow-up consultation is available after completion of the course for an additional fee.)

This course is a core requirement or elective in the following Program(s) in Analytics and Statistical Studies (PASS):

Course Program:

## SESSION 1: Principal Components Analysis

• The goal - dimensionality reduction
• The principal components
• Scale variance estimation
• Normalizing the data
• Principal components and least orthogonal squares
• Exercises

## SESSION 2: Clustering

• What is cluster analysis?
• Hierarchical methods
• Nearest neighbor (single linkage)
• Farthest neighbor (complete linkage)
• Group average (average linkage)
• Optimization and the k-means algorithm
• Similarity measures
• Other distance measures
• The curse of dimensionality
• Exercises

## SESSION 3: Association Rules

• Discovering association rules in transaction databases
• Support and confidence
• The apriori algorithm
• Shortcomings
• Exercises

## SESSION 4: Integration of Supervised and Unsupervised learning

• Clustering into customer segments
• Profiling of customer segments
• Classifying new records by segment

The final lesson is an integration of supervised and unsupervised techniques. To get the full benefit of this course, familiarity with supervised learning is needed, but those not requiring this integration can learn about clustering, association rules and principal components without having had a course in supervised learning.

HOMEWORK:

Homework in this course consists of short answer questions to test concepts, and guided data analysis problems using software.

# Data Mining: Unsupervised Techniques

Instructor(s):

Dates:
October 11, 2013 to November 08, 2013 October 10, 2014 to November 07, 2014
Course Fee: \$499

Add \$50 service fee if you require a prior invoice, or if you need to submit a purchase order or voucher, pay by wire transfer or EFT, or refund and reprocess a prior payment. Please use this printed registration form, for these and other special orders.

Courses may fill up at any time and registrations are processed in the order in which they are received. Your registration will be confirmed for the first available course date, unless you specify otherwise. Those registering for multiple courses, Statistics.com's PASS students, and those affiliated with other academic institutions may be entitled to tuition discounts; read more.

Register

Have you reviewed the REQUIREMENTS for this course?

# Data Mining: Unsupervised Techniquestaught by Tony Babinec

Who Should Take This Course:

Marketers seeking to specify customer segments and identify associations among products purchased, environment scientists seeking to cluster observations, analysts who need to identify the key variables out of many, MBA's seeking to update their knowledge of quantitative techniques, managers and scientists who want to see what data-mining can do, and anyone who wants a practical hands-on grounding in basic data-mining techniques.

Level:

Intermediate/Introductory

Prerequisite:

If you are unclear as to whether you have mastered the above requirements, try these placement tests.

In addition, there is a lesson in the course where supervised and unsupervised learning techniques are using in combination, so, unless you do not need this portion, you should be familiar with supervised learning methods, such as those presented in Introduction to Predictive Modeling.

Organization of the Course:

This course takes place online at the Institute for 4 weeks. During each course week, you participate at times of your own choosing - there are no set times when you must be online. Course participants will be given access to a private discussion board. In class discussions led by the instructor, you can post questions, seek clarification, and interact with your fellow students and the instructor.

The course typically requires 15 hours per week. At the beginning of each week, you receive the relevant material, in addition to answers to exercises from the previous session. During the week, you are expected to go over the course materials, work through exercises, and submit answers. Discussion among participants is encouraged. The instructor will provide answers and comments, and at the end of the week, you will receive individual feedback on your homework answers.

Credit:
Students come to the Institute for a variety of reasons. As you begin the course, you will be asked to specify your category:
1. You may be interested only in learning the material presented, and not be concerned with grades or a record of completion.
2. You may be enrolled in PASS (Programs in Analytics and Statistical Studies) that requires demonstration of proficiency in the subject, in which case your work will be assessed for a grade.
3. You may require a "Record of Course Completion," along with professional development credit in the form of Continuing Education Units (CEU's).  For those successfully completing the course, 5.0 CEU's and a record of course completion will be issued by The Institute, upon request.
Course Text:

The required text for this course is Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner, 2nd Edition, by Shmueli, Patel and Bruce, and it can be ordered from Wiley by clicking here. Wiley typically offers statistics.com customers up to 15% discount on this book (and all other statistics titles): enter the code aff15 in the Promotion Code field when prompted during checkout and click the Apply Discount button. (If you are located in Asia, the web procedure for your location may not accept this discount – try calling your regional Wiley representative.).

PLEASE ORDER YOUR COPY IN TIME FOR THE COURSE STARTING DATE.

Software:

This is a hands-on course. Participants will apply data mining algorithms to real data, and interpret the results. Course illustrations and homework assignments will use XLMiner, a data mining add-in for Excel. Teaching assistants will be able to offer feedback on assignments completed using XLMiner. Other data mining programs may be used by participants, but support will not be available. A six-month license to XLMiner comes bundled with the course text. For information on XLMiner or other software, click here.

Want to be
notified of future
course offerings?
Please enter first name.
Please enter last name.
Please enter valid E-mail.

### Students comment on our courses:

"Good value for the money. Thank you very much for a thought- provoking course"
J. Politch
Harvard
"Good value for the money. Thank you very much for a thought- provoking course"
J. Politch
Harvard
"Web forums are excellent."
S. Clark
GlaxoSmithKline
"Web forums are excellent."
S. Clark
GlaxoSmithKline
"The course was very good and well presented. The material in the notes was self-explanatory for a non-technical person, and the supplementary book provided good reading for the person who is interested in more technical details."
Gichangi
Dept. of Statistics, Univ. of Southern Denmark (doctoral student)
"The course was very good and well presented. The material in the notes was self-explanatory for a non-technical person, and the supplementary book provided good reading for the person who is interested in more technical details."
Gichangi
Dept. of Statistics, Univ. of Southern Denmark (doctoral student)
© statistics.com 2004-2011