“out-of-bag,” as in “out-of-bag error”

July 3, 2018Word of the Week

The process is most often used with classification and regression trees – multiple trees are created on the bootstrap samples, and the resulting predictions are averaged. This ensemble approach, termed a random forest, typically outperforms the use of a single tree. In the bootstrap process, often random resamples of variables are taken, as well as of records.

The out-of-bag error, or OOB error, is the prediction error on each of the bootstrap samples. It is used to tune parameters of the model. For example, tree depth is of primary interest with classification and regression trees – how far should the tree be grown? Too shallow, and you lose predictive power. Too deep, and you overfit the data (which produces an increased error in predicting new data). The OOB error for trees of different depths can be calculated in each bootstrap cycle, and the minimum-error depth noted. An introduction to these trees is part of our Predictive Analytics courses.