The most striking one is “sample” – to statisticians it means a dataset selected from a larger dataset.

To computer scientists and machine learners it often means a **single** observation or record in a dataset. Other synonyms for the single record include *example*, *instance*, *pattern *(from machine learning), *case *(statistics) or *row *(database technology).

*Outliers* are also *anomalies*.

An *outcome variable* is also called a *dependent variable*, *response variable*, or a *target variable*.

In predictive modeling, subsets of the data are typically withheld from the model fitting process, then the fitted model is tested on the withheld data. Those withheld data are called the *holdout data*, the *validation data*, or the *test data*. In some applications two extra datasets are withheld, one of them to be used only at the very end, to assess bias in the ultimate chosen model (not to do further tuning of models). This third set is, in the original SAS terminology, the *test *data.

In predictive modeling a classification model predicts what category a record belongs to; in biostatistics a diagnostic test performs a similar role. In biostatistics, *sensitivity *is the proportion of 1’s (cases of interest) correctly identified by the test. The same metric is called *recall *in predictive modeling.

The term *decision trees*, by contrast, is used in two very different ways. In predictive modeling, decision trees learned from data establish rules for predictor variables that can be used to predict unknown outcome variables. In decision analysis, a decision-maker would construct a branching decision tree to account for different possible outcomes to events, and their probabilities and costs or benefits, to identify the maximum expected value of a particular decision path.