Skip to content

Historical Spotlight: Iris Dataset

Can you identify this wildflower, photographed in a Massachusetts field? 

And also identify its significance in the history of statistics? 

Flower

This is the Blue Flag Iris, also called the Veriscolor Iris, and it is one of three Iris species that make up the famous (in statistics) Iris dataset.  This dataset consisted of five values for each of several hundred flowers:  Iris species, length and the width of the sepals (like a green petal), and length and width of the petals

Ronald Fisher used these data in his 1936 exposition of linear discriminant analysis, published in “The use of multiple measurements in taxonomic problems,” in the September 1936 issue of the Annals of Eugenics. 

Discriminant analysis is considered the first multivariate classification method. Its use now is limited, as more modern statistical and machine learning methods have come into play, but it has links to principal components analysis, which is more widely used, and it does have a role as a very computationally-efficient method of feature selection.