Intervals (confidence, prediction and tolerance) - Statistics.com: Data Science, Analytics & Statistics Courses

All students of statistics encounter confidence intervals. Confidence intervals tell you, roughly, the interval within which you can be, say, 95% confident that the true value of some sample statistic lies. This is not the precise technical definition, but it is how people use the intervals. Confidence intervals to some statistic – say, the mean – calculated from a sample. You would use a confidence interval to communicate the degree of uncertainty about some numerical estimate based on a sample.

A prediction interval, by contrast, is about an individual data point, not a sample statistic. It expresses the degree of uncertainty around a specific prediction from a model, say a linear regression. It is stated in the form “on average we can expect, say, 95% of our predicted values to fall in this interval.” A prediction interval will, naturally, be much wider than a confidence interval (which gets narrower and narrower as you take bigger samples).

A tolerance interval, like a prediction interval, is also about a single data point. It differs from a prediction interval in that we add a second quantification of uncertainty. In a prediction interval, the statement 0.25 “on average we can expect, say, 95% of our predicted values to fall in this interval” implies that half the time more than 95% of the predictions will fall in the interval, and half the time fewer than 95% of the predictions will fall in the interval. A tolerance interval quantifies that first part of the statement – i.e. it says, for example, “90% of the time 95% of the predictions will fall in the interval.” A tolerance interval in which that first uncertainty value is set to 50% is equivalent to a prediction interval.

A tolerance interval is not to be confused with manufacturing “tolerances,” which are statements about intervals within which we hope, expect or require some measurement to fall. They may have some relation to prior data (i.e. the organization would not want them to be totally unrelated to reality), but they are not calculated from ongoing process data.

Tom Ryan’s comprehensive text Modern Engineering Statistics covers these intervals in some detail. Tom developed and taught a number of courses with us at the Institute; he passed away in December, 2016.