Statistical and Machine Learning Methods for Analyzing Clusters and Detecting Anomalies
This course will teach you how to use various cluster analysis methods to identify possible clusters in multivariate data. Methods discussed include hierarchical clustering, k-means clustering, two-step clustering, and normal mixture models for continuous variables.
Overview
Clusters are clumps of data that are internally cohesive and separated from other clusters. In marketing disciplines, cluster analysis is the basis for identifying clusters of customer records, a process call market segmentation. An anomaly is a pattern in the data that does not conform to expected normal behavior. In one sense an anomaly is the flip side of a cluster: a data point, or points that are distant from a cluster. Anomaly detection is useful in a variety of fields (surveillance for fraud, monitoring of complex industrial processes, to name two). This is a hands-on course in which you will use statistical software to apply cluster method algorithms to real data, and interpret the results. This same cluster analysis can be used to identify anomalies. The course also covers the use of supervised learning algorithms to identify anomalies.
- Intermediate
- 4 Weeks
- Expert Instructor
- Tuiton-Back Guarantee
- 100% Online
- TA Support
Learning Outcomes
After taking this course, you will be able to:
- Conduct hierarchical cluster analysis and k-means clustering to identify clusters in multivariate data
- Use normal mixture models for clustering of continuous variables
- Interpret/diagnose the output of different clustering procedures
- Apply normalization of data appropriately in cluster analysis
- Identify the assignment of cases to clusters
- Determine how to apply a supervised learning algorithm to a classification problem for anomaly detection
- Apply and assess a clustering algorithm for identifying anomalies in the absence of labels
Who Should Take This Course
- Marketing analysts who need to cluster customer data as part of a market segmentation strategy;
- Computational biologists (e.g. for taxonomy);
- Environmental scientists (e.g. for habitat studies);
- IT specialists (e.g. in modeling web traffic patterns);
- Military and national security analysts (e.g. in automated analysis of intercepted communications).
Our Instructors
Course Syllabus
Week 1
Hierarchical Clustering
- Hierarchical clustering – dendrograms
- Divisive vs. agglomerative methods
- Distance metrics
- Different linkage methods
- Single linkage as anomaly detector
Week 2
K-means Clustering
- K-means Clustering
- Choosing number of clusters
Week 3
Normal Mixture Model
- Finite mixture model
- Statistical models to identify constituent groups
- K-means cluster as a special case
Week 4
Practical Considerations
- Using subsets of variables
- Different data types
- Cluster quality and robustness
Class Dates
2024
Instructors: Mr. Anthony Babinec
Instructors: Mr. Anthony Babinec
2025
Instructors: Mr. Anthony Babinec
Instructors:
Prerequisites
We assume you are versed in statistics. This course assumes knowledge of supervised learning, and some multivariate data is needed, such as that provided in the following courses.
Predictive Analytics 1 – Machine Learning Tools
- Skill: Intermediate
- Credit Options: ACE, CAP, CEU
Predictive Analytics 2 – Neural Nets and Regression
- Skill: Intermediate
- Credit Options: ACE, CAP, CEU
The Statistics.com courses have helped me a lot, pushing me to the limit and making me learn much more than I expected I could. The knowledge I gained I could immediately leverage in my job … then eventually led to landing a job in my dream company – Amazon.
Karolis Urbonas
This program has been a life and work game changer for me. Within 2 weeks of taking this class, I was able to produce far more than I ever had before.
Susan Kamp
The material covered in the Analytics for Data Science Certificate will be indispensable in my work. I can’t wait to take other courses. Great work!
Stephen McAllister
I learned more in the past 6 weeks than I did taking a full semester of statistics in college, and 10 weeks of statistics in graduate school. Seriously.
Amir Aminimanizani
This is the best online course I have ever taken. Very well prepared. Covers a lot of real-life problems. Good job, thank you very much!
Elena Rose
The more courses I take at Statistics.com, the more appreciation I have for the smart approach, quality of instructors, assistants, admin and program. Well done!
Leonardo Nagata
This course greatly benefited me because I am interested in working in AI. It has given me solid foundational knowledge…After completing this last course, I feel I have gained valuable skills that will enhance my employability in Data Science, opening up diverse career opportunities.
Richard Jackson
Frequently Asked Questions
-
What is your satisfaction guarantee and how does it work?
-
Can I transfer or withdraw from a course?
-
Who are the instructors at Statistics.com?
Visit our knowledge base and learn more.
Register For This Course
Statistical and Machine Learning Methods for Analyzing Clusters and Detecting Anomalies
Additional Information
Organization of Course
This course takes place online at The Institute for 4 weeks. During each course week, you participate at times of your own choosing – there are no set times when you must be online. Course participants will be given access to a private discussion board. In class discussions led by the instructor, you can post questions, seek clarification, and interact with your fellow students and the instructor.
At the beginning of each week, you receive the relevant material, in addition to answers to exercises from the previous session. During the week, you are expected to go over the course materials, work through exercises, and submit answers. Discussion among participants is encouraged. The instructor will provide answers and comments, and at the end of the week, you will receive individual feedback on your homework answers.
Time Requirements
This is a 4-week course requiring 10-15 hours per week of review and study, at times of your choosing.
Homework
Homework in this course consists of short answer questions to test concepts and guided data analysis problems using software. In addition to assigned readings, this course also has an end of course data modeling project.
Course Text
This course will use papers that will be made available electronically, and will also refer to sections from the book Cluster Analysis, 5th Edition, by Brian S. Everitt, Dr Sabine Landau, Dr Morven Leese, Dr Daniel Stahl.
Software
This is a hands-on course. Participants will apply cluster methods algorithms to real data, and interpret the results, so software capable of doing cluster analysis is required. The model solutions for the assignments were developed in IBM SPSS Statistics and Latent Gold. In addition, we also provide solutions using R. Other possible choices include XLStat and Analytic Solver Data Mining.
Course Fee & Information
Enrollment
Courses may fill up at any time and registrations are processed in the order in which they are received. Your registration will be confirmed for the first available course date unless you specify otherwise.
Transfers and Withdrawals
We have flexible policies to transfer to another course or withdraw if necessary.
Group Rates
Contact us to get information on group rates.
Discounts
Academic affiliation? In most courses you are eligible for a discount at checkout.
New to Statistics.com? Click here for a special introductory discount code.
Invoice or Purchase Order
Add $50 service fee if you require a prior invoice, or if you need to submit a purchase order or voucher, pay by wire transfer or EFT, or refund and reprocess a prior payment.
Options for Credit and Recognition
This course is eligible for the following credit and recognition options:
No Credit
You may take this course without pursuing credit or a record of completion.
Mastery or Certificate Program Credit
If you are enrolled in mastery or certificate program that requires demonstration of proficiency in this subject, your course work may be assessed for a grade.
CEUs and Proof of Completion
If you require a “Record of Course Completion” along with professional development credit in the form of Continuing Education Units (CEU’s), upon successfully completing the course, CEU’s and a record of course completion will be issued by The Institute upon your request.
INFORMS-CAP
This course is recognized by the Institute for Operations Research and the Management Sciences (INFORMS) as helpful preparation for the Certified Analytics Professional (CAP®) exam and can help CAP® analysts accrue Professional Development Units to maintain their certification.
ACE CREDIT | College Credit
This course has been evaluated by the American Council on Education (ACE) and is recommended for Graduate credit, 3 semester hours in statistics. Please note that the decision to accept specific credit recommendations is up to the academic institution accepting the credit.
Supplemental Information
Literacy, Accessibility, and Dyslexia
At Statistics.com, we aim to provide a learning environment suitable for everyone. To help you get the most out of your learning experience, we have researched and tested several assistance tools. For students with dyslexia, colorblindness, or reading difficulties, we recommend the following web browser add-ons and extensions:
Chrome
- Color Enhancer (for colorblindness)
- HelperBird (for colorblindness, dyslexia, and reading difficulties)
Firefox
- Mobile Dyslexic
- Color Vision Simulation (native accessibility feature)
- Other native accessibility features instructions
Safari
- Navidys (for colorblindness, dyslexia, and reading difficulties)
- HelperBird for Safari (for colorblindness, dyslexia, and reading difficulties)
Miscellaneous
Register For This Course
Statistical and Machine Learning Methods for Analyzing Clusters and Detecting Anomalies