An often-overlooked basic part of learning new things is vocabulary: if you don’t fully understand the meaning of terms, you are handicapped. Worse, if you think you do understand, but that understanding is wrong, you are deprived of the ability to identify the gap in your understanding. This can happen in data science, where different communities (statisticians, IT engineers, computer scientists) may have different meanings for the same word. A couple of weeks ago, we looked at multiple meanings of the term bias. This week we look at two more: inference and confidence.
In machine learning, inference refers to the process of operationalizing a trained model by applying it to new data and making predictions. This process is also called scoring, and the output (the prediction) is the score. Inference is the final phase of modeling and does not incorporate the earlier processes of training different models, assessing their performance, selecting the best model and tuning its parameters.
In statistics, inference is something different and more complex: it is the process of (1) estimating quantities of interest in a population by using sample from that population, then (2) quantifying the uncertainty around these population estimates, specifically uncertainty caused by random variation. The variation can occur in sampling, or in assignment of subjects to a treatment, or both. Inferential statistics is a full branch of statistics, incorporating an initial phase of making estimates on the basis of samples, and a second phase of quantifying possible uncertainty from random variation, using confidence intervals and p-values. Using resampling (bootstrapping and permutation), quantifying uncertainty is fairly straightforward. Classical (pre-computer) statistics has built a complex and (to those new to statistics) intimidating structure, much of which remains in place due to the force of inertia.
In machine learning, the term confidence often refers to the estimated probability of an event or item of interest. For example, some predictive learning algorithms report output might say, e.g., “the confidence [i.e. estimated probability] is 80% that this record belongs to class A. But confidence is often used to refer to a conditional prevalence. For example, in association rules for transactions (used in affinity analysis and recommender systems), confidence for a rule like “if A is purchased, so is B” quantifies the proportion of transactions with A that also include B. One of several metrics that measure the power of transaction rules, it is based on actual counts of items purchased, though you will see the terminology of probability (P B|A, the probability of B given A) in software output.
In statistics, the term confidence is used primarily in relation to the concept of a confidence interval, which is a range that encloses a measurement or estimate. It reflects the uncertainty in the estimate due to sampling error. For example, after a random survey of Twitter users, you might say that their average age, which was 34 in the survey, lies between 31 and 37, with 90% confidence. Technically, this means that 90% of the samples drawn from a population that is well represented by the original sample and has a mean of 34 will have a sample mean that lies between 31 and 37. This convoluted definition is of limited practical value, so most people interpret the result as “the probability is 90% the average age of Twitter users is between 31 and 37.” To paraphrase George Box (“all models are wrong, but some are useful”), this interpretation is not strictly correct but it is useful.