# Glossary of statistical terms

Hierarchical Cluster Analysis:

Hierarchical cluster analysis (or hierarchical clustering) is a general approach to cluster analysis , in which the object is to group together objects or records that are "close" to one another. A key component of the analysis is repeated calculation of distance measures between objects, and between clusters once objects begin to be grouped into clusters. The outcome is represented graphically as a dendrogram .

The initial data for the hierarchical cluster analysis of N objects is a set of object-to-object distances and a linkage function for computation of the cluster-to-cluster distances.

The two main categories of methods for hierarchical cluster analysis are divisive methods and agglomerative methods . In practice, the agglomerative methods are of wider use. On each step, the pair of clusters with smallest cluster-to-cluster distance is fused into a single cluster. The most common algorithms for hierarchical clustering are:

These algorithms differ mainly by the linkage function - the method for calculation of cluster-to-cluster distance.

Browse Other Glossary Entries

Statistics.com offers over 100 courses in statistics from introductory to advanced level. Most are 4 weeks long and take place online in series of weekly lessons and assignments, requiring about 15 hours/week. Participate at your convenience; there are no set times when you must to be online. Ask questions and exchange comments with the instructor and other students on a private discussion board throughout the course.

Predictive Analytics 1 - Machine Learning Tools

This course covers the two core paradigms that account for most business applications of predictive modeling: classification and prediction. The course includes hands-on work with XLMiner, a data-mining add-in for Excel.

Predictive Analytics 3: Dimension Reduction, Clustering and Association Rules

This course covers key unsupervised learning techniques - association rules, principal components analysis, and clustering. The course will include an integration of supervised and unsupervised learning techniques.

Cluster Analysis

In this online course, “Cluster Analysis,” you will learn how to use various cluster analysis methods to identify possible clusters in multivariate data. Methods discussed include hierarchical clustering, k-means clustering, two-step clustering, and normal mixture models for continuous variables.

Decision Trees and Rule-Based Segmentation

Rule induction is an important component of data mining, and this course covers two main styles of generating rules.

Statistical Analysis of Microarray Data with R

This course will acquaint you with the process of analysis of microarray data. You will learn how to preprocess the data, short list the differentially expressed genes, carryout principal component analysis to reduce the dimensionality and to detect interesting gene expression patterns, and clustering of genes and samples. Illustrations of the statistical issues involved at the various stages of the analysis will use real data sets from DNA microarray experiments; background will be provided on the use of Bioconductor.

Text Mining

This course will introduce the essential techniques of text mining, understood here as the extension of data mining's standard predictive methods to unstructured text.

Back to Main Glossary

# Promoting better understanding of statistics throughout the world

To celebrate the International Year of Statistics in 2013, we started a program to provide a statistical term every week, delivered directly to your inbox. The Word of the Week program proved to be quite popular, and continues. The Institute for Statistics Education offers an extensive glossary of statistical terms, available to all for reference and research. Make it your New Year's resolution to improve your own statistical knowledge! Sign up here. Rather not have more email? Simply bookmark our home page and check our “Stats Word of the Week” feature.

Want to be notified of future courses?

Yes