## Ridge Regression

Ridge regression is a method of penalizing coefficients in a regression model to force a more parsimonious model (one with fewer predictors) than would be produced by an ordinary least squares model. The term “ridge” was applied by Arthur Hoerl in 1970, who saw similarities to the ridges of quadratic response functions.  In ordinary leastContinue reading “Ridge Regression”

## Factor

The term “factor” has different meanings in statistics that can be confusing because they conflict.   In statistical programming languages like R, factor acts as an adjective, used synonymously with categorical – a factor variable is the same thing as a categorical variable.  These factor variables have levels, which are the same thing as categories (aContinue reading “Factor”

## Purity

In classification, purity measures the extent to which a group of records share the same class.  It is also termed class purity or homogeneity, and sometimes impurity is measured instead.  The measure Gini impurity, for example, is calculated for a two-class case as p(1-p), where p = the proportion of records belonging to class 1. Continue reading “Purity”

## Predictor P-Values in Predictive Modeling

Not So Useful Predictor p-values in linear models are a guide to the statistical significance of a predictor coefficient value – they measure the probability that a randomly shuffled model could have produced a coefficient as great as the fitted value.  They are of limited utility in predictive modeling applications for various reasons: Software typicallyContinue reading “Predictor P-Values in Predictive Modeling”