Skip to content


The term “factor” has different meanings in statistics that can be confusing because they conflict.   In statistical programming languages like R, factor acts as an adjective, used synonymously with categorical – a factor variable is the same thing as a categorical variable.  These factor variables have levels, which are the same thing as categories (aContinue reading “Factor”


In classification, purity measures the extent to which a group of records share the same class.  It is also termed class purity or homogeneity, and sometimes impurity is measured instead.  The measure Gini impurity, for example, is calculated for a two-class case as p(1-p), where p = the proportion of records belonging to class 1. Continue reading “Purity”

Predictor P-Values in Predictive Modeling

Not So Useful Predictor p-values in linear models are a guide to the statistical significance of a predictor coefficient value – they measure the probability that a randomly shuffled model could have produced a coefficient as great as the fitted value.  They are of limited utility in predictive modeling applications for various reasons: Software typicallyContinue reading “Predictor P-Values in Predictive Modeling”