Skip to content

SPLINE

In many cases, you can see that a single linear relationship is not the most appropriate way to model the data. A better approach is to fit the predictor-outcome relationship more locally, as shown in these data, where a linear relationship is clearly inappropriate (credit to Andrew Gelman):

The resulting fit is a spline. The red line is a highly local fit, the purple line a more “regional” fit.

The idea of a “spline” is not new; it comes from the art of boat building where strips of wood needed to be shaped into curves whose mathematical definition is constantly changing over the range of the curve. Pliable wood strips were held in position by “ducks” to achieve the desired curvature:

You can read more in

  • Practical Statistics for Data Science, by Peter Bruce and Andrew Bruce, O’Reilly
  • An Introduction to Statistical Learning, by James, Witten, Hastie, and Tibshirani

While the single linear structure of linear regression renders it unsuitable for some data situations, where it is appropriate it has some real advantages. Learn more about our online course Regression Analysis.