Glossary of statistical terms
The k-means clustering method is used in non-hierarchical cluster analysis . The goal is to divide the whole set of objects into a predefined number (k) of clusters. The criteria for such subdivision is normally the minimal dispersion inside clusters - e.g. the minimal sum of squares of the distances from the mean vector ( centroid ) of the cluster. A direct rigorous solution to this problem requires testing of an impractically large number of data subdivisions. The k-means clustering is a fast heuristic method that provides a reasonably good solution, although not optimal.
For more details see the chapter in the XLMiner help .
Want to learn more about this topic?
Statistics.com offers over 100 courses in statistics from introductory to advanced level. Most are 4 weeks long and take place online in series of weekly lessons and assignments, requiring about 15 hours/week. Participate at your convenience; there are no set times when you must to be online. Ask questions and exchange comments with the instructor and other students on a private discussion board throughout the course.
This course covers key unsupervised learning techniques - association rules, principal components analysis, and clustering. The course will include an integration of supervised and unsupervised learning techniques.
In this online course, “Cluster Analysis,” you will learn how to use various cluster analysis methods to identify possible clusters in multivariate data. Methods discussed include hierarchical clustering, k-means clustering, two-step clustering, and normal mixture models for continuous variables.
This course covers key multivariate procedures such as multivariate analysis of variance (MANOVA), principal components, factor analysis and classification.
This course will acquaint you with the process of analysis of microarray data. You will learn how to preprocess the data, short list the differentially expressed genes, carryout principal component analysis to reduce the dimensionality and to detect interesting gene expression patterns, and clustering of genes and samples. Illustrations of the statistical issues involved at the various stages of the analysis will use real data sets from DNA microarray experiments; background will be provided on the use of Bioconductor.
This course will introduce the essential techniques of text mining, understood here as the extension of data mining's standard predictive methods to unstructured text.