Skip to content

Purity

In classification, purity measures the extent to which a group of records share the same class.  It is also termed class purity or homogeneity, and sometimes impurity is measured instead.  The measure Gini impurity, for example, is calculated for a two-class case as p(1-p), where p = the proportion of records belonging to class 1.  The lower the Gini impurity, the more pure. Measuring impurity is particularly important in decision tree algorithms that split data so as to maximize purity in the resulting partitions.