Glossary of statistical terms
Hierarchical Cluster Analysis:
Hierarchical cluster analysis (or hierarchical clustering) is a general approach to cluster analysis , in which the object is to group together objects or records that are "close" to one another. A key component of the analysis is repeated calculation of distance measures between objects, and between clusters once objects begin to be grouped into clusters. The outcome is represented graphically as a dendrogram .
The initial data for the hierarchical cluster analysis of N objects is a set of object-to-object distances and a linkage function for computation of the cluster-to-cluster distances.
The two main categories of methods for hierarchical cluster analysis are divisive methods and agglomerative methods . In practice, the agglomerative methods are of wider use. On each step, the pair of clusters with smallest cluster-to-cluster distance is fused into a single cluster. The most common algorithms for hierarchical clustering are:
These algorithms differ mainly by the linkage function - the method for calculation of cluster-to-cluster distance.
Want to learn more about this topic?
Statistics.com offers over 100 courses in statistics from introductory to advanced level. Most are 4 weeks long and take place online in series of weekly lessons and assignments, requiring about 15 hours/week. Participate at your convenience; there are no set times when you must to be online. Ask questions and exchange comments with the instructor and other students on a private discussion board throughout the course.
This course covers the two core paradigms that account for most business applications of predictive modeling: classification and prediction. The course includes hands-on work with XLMiner, a data-mining add-in for Excel.
This course covers key unsupervised learning techniques - association rules, principal components analysis, and clustering. The course will include an integration of supervised and unsupervised learning techniques.
In this online course, “Cluster Analysis,” you will learn how to use various cluster analysis methods to identify possible clusters in multivariate data. Methods discussed include hierarchical clustering, k-means clustering, two-step clustering, and normal mixture models for continuous variables.
Rule induction is an important component of data mining, and this course covers two main styles of generating rules.
This course will acquaint you with the process of analysis of microarray data. You will learn how to preprocess the data, short list the differentially expressed genes, carryout principal component analysis to reduce the dimensionality and to detect interesting gene expression patterns, and clustering of genes and samples. Illustrations of the statistical issues involved at the various stages of the analysis will use real data sets from DNA microarray experiments; background will be provided on the use of Bioconductor.
This course will introduce the essential techniques of text mining, understood here as the extension of data mining's standard predictive methods to unstructured text.