**Homonyms** (words with multiple meanings):

**Bias:** To a lay person, bias refers to an opinion about something that is pre-formed in advance of specific facts. As consideration of ethical issues in data science grows, this meaning has crept into discussion of the fairness or social worth of machine learning algorithms. But the term has a more narrow definition in statistics – it refers to the tendency of an estimation procedure, or a model, to arrive at estimates or predictions that are, on balance, off target.

**Confidence:** To a statistician, confidence measures sample reliability (we are 95% confident that the average blood sugar in the group lies between X and Y, based on a sample of N patients). To a machine learner, confidence can refer to a metric used in association rules (“what goes with what in market basket transactions”), one of several measures of the strength of a rule.

**Decision Trees:** To statisticians and machine learners, “decision trees,” also called “classification and regression trees” (CART), is a term for a class of algorithms that progressively partition data into chunks that are more and more homogeneous with respect to the outcome variable. The result is a branching set of rules applied to predictor variables to predict the outcome. To an operations research specialist, “decision trees” are a representation of progressive decisions and possible outcomes, with probabilities, plus costs/benefits, attached to the outcomes. The path ending in the highest expected value then guides decisions.

**Graph:** To a lay person, a graph usually means a visual representation of data, which statisticians more often refer to as plots and charts. To computer scientist, graph refers to a data structure of entities? ties and links between them. Speaking of graphs, Wikipedia has an interesting Venn-style diagram of homonyms, synonyms, homographs and their cousins (right).

**Normalize:** In statistics and machine learning, to normalize a variable is to rescale it, so that it is on the same scale as other variables to be used in a model. For example, to subtract the mean, so it is centered around 0, and to divide by the standard deviation, so that it has a consistent scale with other variables so normalized. In database management, normalization refers to the process of organizing relational databases and their tables so that the data are not redundant and relations among tables are consistent.

**Sample:** In statistics, a sample is a collection of observations or records. In computer science and machine learning, sample often refers to a single record.