2018 - Page 3 of 3 - Statistics.com: Data Science, Analytics & Statistics Courses

“Money and Brains” and “Furs and Station Wagons”

“Money and Brains” and “Furs and Station Wagons” were evocative customer shorthands that the marketing company Claritas came up with over a half century ago. These names, which facilitated the work of marketers and sales people, were shorthand descriptions of segments of customers identified through statistical cluster analysis. Cluster analysis is also used in marketContinue reading ““Money and Brains” and “Furs and Station Wagons””

Course Spotlight: Text Mining

The term text mining is sometimes used in two different meanings in computational statistics: Using predictive modeling to label many documents (e.g. legal docs might be “relevant” or “not relevant”) – this is what we call text mining. Using grammar and syntax to parse the meaning of individual documents – we use the term naturalContinue reading “Course Spotlight: Text Mining”

CONVOLUTION and TENSOR

Today’s Words of the Week are convolution and tensor, key components of deep learning.

BENFORD’S LAW

Benford’s law describes an expected distribution of the first digit in many naturally-occurring datasets.

CONTINGENCY TABLES

Contingency tables are tables of counts of events or things, cross-tabulated by row and column.

HYPERPARAMETER

Hyperparameter is used in machine learning, where it refers, loosely speaking, to user-set parameters, and in Bayesian statistics, to refer to parameters of the prior distribution.

SAMPLE

Why sample? A while ago, sample would not have been a candidate for Word of the Week, its meaning being pretty obvious to anyone with a passing acquaintance with statistics. I select it today because of some output I saw from a decision tree in Python.

SPLINE

The easiest way to think of a spline is to first think of linear regression – a single linear relationship between an outcome variable and various predictor variables.

NLP

To some, NLP = natural language processing, a form of text analytics arising from the field of computational linguistics.

OVERFIT

As applied to statistical models – “overfit” means the model is too accurate, and fitting noise, not signal. For example, the complex polynomial curve in the figure fits the data with no error, but you would not want to rely on it to predict accurately for new data: