#### Week #40 – Natural Language

A natural language is what most people outside the field of computer science think of as just a language (Spanish, English, etc.). The term...

#### Week # 39 – White Hat Bias

White Hat Bias is bias leading to distortion in, or selective presentation of, data that is considered by investigators or reviewers to be acceptable because it is in the service of righteous goals.

#### Week # 38 – Edge

An edge is a link between two people or entities in a network that can be

#### Week #37 – Stratified Sampling

Stratified sampling is a method of random sampling.

#### Week #36 – Conditional Probability

When probabilities are quoted without specification of the sample space, it could result in ambiguity when the sample space is not self-evident.

#### Week #35 – Continuous vs. Discrete Distributions

A discrete distribution is one in which the data can only take on certain values, for example integers.  A continuous distribution is one in which data can take on any value within a specified range (which may be infinite).

#### Week # 34 – Central Limit Theorem

The central limit theorem states that the sampling distribution of the mean approaches Normality as the sample size increases, regardless of the probability distribution of the population from which the sample is drawn.

#### Week #33 – Classification and Regression Trees (CART)

Classification and regression trees (CART) are a set of techniques for classification and prediction.

#### Week #32 – CHAID

CHAID stands for Chi-squared Automatic Interaction Detector. It is a method for building classification trees and regression trees from a training sample comprising already-classified objects.

#### Week # 31 – Census

In a census survey , all units from the population of interest are analyzed. A related concept is the sample survey, in which only a subset of the population is taken.

#### Week #30 – Discriminant analysis

Discriminant analysis is a method of distinguishing between classes of objects.  The objects are typically represented as rows in a matrix.

#### Week # 29 – Training data

Also called the training sample, training set, calibration sample.  The context is predictive modeling (also called supervised data mining) -  where you have data with multiple predictor variables and a single known outcome or target variable.

#### Week #28 – Bias

A general statistical term meaning a systematic (not random) deviation of an estimate from the true value.

#### Week #27 – Backward Elimination

One of several computer-based iterative procedures for selecting variables to use in a model.  The process begins...

#### Week #26 – Statistical Significance

Outcomes to an experiment or repeated events are statistically significant if they differ from what chance variation might produce.

#### Week #25 – Family-wise Type I Error

In multiple comparison procedures, family-wise type I error is the probability that, even if all samples come from the same population, you will wrongly conclude

#### Week #24 – Cohort study

A cohort study is a longitudinal study that identifies a group of subjects sharing some attributes (a "cohort") then

#### Week #23 – Coefficient of variation

The coefficient of variation is the standard deviation of a data set, divided by the mean of the same data set.

#### Week #22 – Coefficient of Determination

In regression analysis, the coefficient of determination is a measure of goodness-of-fit (i.e. how well or tightly the data fit the estimated model).  The coefficient is

#### Week #21 – Consistent Estimator

An estimator is a measure or metric intended to be calculated from a sample drawn from a larger population...

#### Week #20 – Collinearity

In regression analysis , collinearity of two variables means that strong correlation exists between them, making it difficult or impossible to estimate their individual regression coefficients reliably.

#### Week #19 – Cohort study

A cohort study is a longitudinal study that identifies a population or large group (a "cohort") then draws a sample from the population at various points in time and records data for the sample.

#### Week #18 – Centroid

The centroid is a measure of center in multi-dimensional space.

#### Week #17 – Bootstrapping

Bootstrapping is sampling with replacement from observed data to estimate the variability in a statistic of interest. See also permutation tests, a related form of resampling. A common application

#### Week #16 – Binomial Distribution

A Binomial distribution is used to describe an experiment, event, or process for which the probability of success is the same for each trial and each trial has only two possible outcomes.

#### Week #15 – Uplift or Persuasion Modeling

A combination of treatment comparisons (e.g. send a sales solicitation, or send nothing) and predictive modeling to determine which cases or subjects respond (e.g. purchase or not) to which treatments.

#### Week #14 – Network Analytics

Network analytics is the science of describing and, especially, visualizing the connections among objects.

#### Week #13 – Multiplicity issues

Multiplicity issues arise in a number of contexts, but they generally boil down to the same thing:  repeated looks at a data set in different ways, until something "statistically significant" emerges.

#### Week #12 – Support vector machines

Support vector machines are used in data mining (predictive modeling, to be specific) for classification of records, by learning from training data.

#### Week #11 – Attribute

In data analysis or data mining, an attribute is a characteristic or feature that is measured for each observation (record) and can vary from one observation to another.  It might

#### Week #10 – Negative Binomial

The negative binomial distribution is the probability distribution of the number of Bernoulli (yes/no) trials required to obtain r successes.

#### Week #9 – Random Walk

A random walk is a process of random steps, motions, or transitions.  It might be in one dimension (movement along a line), in two dimensions (movements in a plane), or in three dimensions or more.

#### Week #8 – Cover Time

Cover time is the expected number of steps in a random walk required

#### Week #7 – Cross-Validation

is a general computer-intensive approach used in estimating the accuracy of statistical models.

#### Week #6 – Distance Matrix

(also called dissimilarity matrix) describes pairwise distinction between M objects.

#### Week #5 – Differencing of a Time Series

in discrete time is the transformation of the series to a new time series where the values are the differences between consecutive values of the original series.

#### Week #4 – Dichotomous

(outcome or variable) means "having only two possible values", e.g.

#### Week #2 – Density Function

A probability density function is a curve used

#### Week #1 – Data Partitioning

In predictive modeling, data partitioning is the division of the data available for analysis into two or three non-overlapping

#### 2013 – The International Year of Statistics

Promoting better understanding of statistics throughout the world.

#### Congratulations to Michelle Everson!

New Editor of Journal of Statistics Education

#### Airline passenger screening can be random

Read Peter's Letter to the Editor in Saturday's Washington Post.

#### Churn Trigger

Last year's popular story out of the Predictive Analytics World conference series was Andrew Pole's presentation of Target's methodology for predicting which customers were pregnant.

#### Randomized Trials on online learning

Evidence show that there is no significant difference between taking an online introductory statistics course and a traditional in-person class.

#### Congratulations to Thomas Lumley!

Newly elected American Statistical Association (ASA) Fellow, and recognized for his outstanding professional contributions to and leadership in the field of statistical science.

#### Immigration

Arizona's immigration law goes before the Supreme Court this week...

#### Revisiting Catastrophe Modeling Assistant

I saw this job posting a while ago, and, in my next life,

#### Congratulations to David Unwin – Honors of the Association of American Geographers

David Unwin, Emeritus Chair in Geography, Bubeck College, University of London (and instructor at Statistics.com!) will be awarded the Association of American Geographers (AAG) Ronald F. Abler Distinguished Service Honors at the upcoming annual meeting next week.

#### Julian Simon birthday

February 12 was the 80th anniversary of the birth of Julian Simon, an early pioneer in resampling methods.

#### Statistics for Future Presidents

Statistics for Future Presidents - Steve Pierson, Director of Science Policy at ASA wrote interesting blog wondering how statistics for future presidents (or policymakers more generally) would compare with the recommended statistical skills/concepts for others. Take a look and let him know!

#### Congratulations to David Unwin on a New Edited Volume

Teaching Geographic Information Science and Technology in Higher Education, 2012 (Wiley)

#### The Data Scientist

The story of the prospective Facebook IPO, and prior IPO's from LinkedIn, Pandora, and Groupon all involve "data scientists".  Read an interview with Monica Rogati - Senior Data Scientist at LinkedIn to see the connection.

#### Congratulations to Michelle Everson for winning the 2011 Waller Education Award.

Dr. Michelle Everson is recognized for her outstanding contributions to and innovation in the teaching of elementary statistics.

#### Popular Mistakes in Data Mining

John Elder's presentations on common data mining mistakes are a must-see if you have any experience or plans in the data mining arena.

#### Coffee causes cancer?

"Any claim coming from an observational study is most likely to be wrong." Thus begins "Deming, data and observational studies," just published in "Significance Magazine" (Sept. 2011).

#### The sacrifice bunt

I was watching a Washington Nationals game on TV a couple of days ago, and the concept of "expected value" ...

#### Epidemiologist joke

A neurosurgeon, pathologist and epidemiologist are each told to examine a can of sardines on a table in a closed room, and present a report.

#### What do teenagers want?

What do teenagers want? More importantly for the music industry, what music will they buy?

#### The Power of Round

Advertisers shy away from round numbers, believing that \$99 appears significantly cheaper than \$100...

#### Bees on the attack

What does Matt Asher's article "Attack of the Hair Trigger Bees" have to do with global warming? Matt Asher runs statisticsblog.com ...

#### The First Gallup Poll

The first Gallup Poll was published in October, 1935. In America Speaks,

#### Catastrophe Modeling Assistant

Thinking about careers that use statistics? The job title "catastrophe modeling assistant" caught my eye recently in a job announcement. ...

#### Random Monkeys

One of my gifts this holiday season was "A Drunkard's Walk: How Randomness Rules Our Lives,"