#### Deep Learning

Deep Learning refers to complex multi-layer neural nets. They are especially suitable for image and voice recognition, and for unsupervised tasks with complex, unstructured data.

Description.

Deep Learning refers to complex multi-layer neural nets. They are especially suitable for image and voice recognition, and for unsupervised tasks with complex, unstructured data.

Prospective vs. Retrospective A prospective study is one that identifies a scientific (usually medical) problem to be studied, specifies a study design protocol (e.g. what you're measuring, who you're measuring, how many subjects, etc.), and then gathers data in the future in accordance with the…

The estimated or predicted values in a regression or other predictive model are termed the y-hat values. "Y" because y is the outcome or dependent variable in the model equation, and a "hat" symbol (circumflex) placed over the variable name is the statistical designation of…

Azure is the Microsoft Cloud Computing Platform and Services. ML stands for Machine Learning, and is one of the services. Like other cloud computing services, you purchase it on a metered basis - as of 2015, there was a per-prediction charge, and a compute time…

Statistical Glossary Additive Error: Categorical variables are non-numeric "category" variables, e.g. color. Ordered categorical variables are category variables that have a quantitative dimension that can be ordered but is not on a regular scale. Doctors rate pain on a scale of 1 to 10 -…

Statistical Glossary Additive Error: Bimodal literally means "two modes" and is typically used to describe distributions of values that have two centers. For example, the distribution of heights in a sample of adults might have two peaks, one for women and one for men. Browse…

Statistical Glossary HDFS: HDFS is the Hadoop Distributed File System. It is designed to accommodate parallel processing on clusters of commodity hardware, and to be fault tolerant. Browse Other Glossary Entries

Netflix Prize: The Netflix prize was a famous early application of crowdsourcing to predictive modeling. In 2006, Netflix published customer movie rating data and challenged analysts to come up with a predictive model that would improve Netflix's prediction of what your rating would be for…

Prediction vs. Explanation: With the advent of Big Data and data mining, statistical methods like regression and CART have been repurposed to use as tools in predictive modeling. When statistical models are used as a tool of research, the goal is to explain relationships in…

A-B Test: An A-B test is a classic statistical design in which individuals or subjects are randomly split into two groups and some intervention or treatment is applied - one group gets treatment A, the other treatment B. Typically one of the treatments will be…

Statistical Glossary RMSE: RMSE is root mean squared error. In predicting a numerical outcome with a statistical model, predicted values rarely match actual outcomes exactly. The difference between predicted and actual is the error (or residual). To calculate RMSE, square each error, take the average,…

Label: A label is a category into which a record falls, usually in the context of predictive modeling. Label, class and category are different names for discrete values of a target (outcome) variable. "Label" typically has the added connotation that the label is something applied…

Strip transect: A strip transect is a small subsection of a geographically-defined study area, typically chosen randomly. For example, Manly (Introduction to Ecological Sampling, CRC) discusses using randomly selected strips 3 meters wide and 20 meters long which are carefully examined and the number of…

Spark: Spark is a second generation computing environment that sits on top of a Hadoop system, supporting the workflows that leverage a distributed file system. It improves on the performance of the initial Hadoop computational paradigm, MapReduce, via fast functional programming capabilities and the use…

Bandits: Bandits refers to a class of algorithms in which users or subjects make repeated choices among, or decisions in reaction to, multiple alternatives. For example, a web retailer might have a set of N ways of presenting an offer. The task of the algorithm…

<b Multiple looks: In a classic statistical experiment, treatment(s) and placebo are applied to randomly assigned subjects, and, at the end of the experiment, outcomes are compared. With multiple looks, the investigator does not wait until the end of the experiment -- outcomes are compared…

<b Pruning the tree: Classification and regression trees, applied to data with known values for an outcome variable, derive models with rules like "If taxable income <$80,000, if no Schedule C income, if standard deduction taken, then no-audit." Pruning is the process of truncating the…

Features vs. Variables: The predictors in a predictive model are sometimes given different terms by different disciplines. Traditional statisticians think in terms of variables. The machine learning community calls them features (also attributes or inputs). There is a subtle difference in meaning. In predictive modeling,…

Prior and posterior Bayesian statistics typically incorporates new information (e.g. from a diagnostic test, or a recently drawn sample) to answer a question of the form "What is the probability that..." The answer to this question is referred to as the "posterior" probability, arrived at…

Curb-stoning: In survey research, curb-stoning refers to the deliberate fabrication of survey interview data by the interviewer. Often this is done to avoid the work of actually conducting the surveys. Statistical methods have been developed that can help to identify data that is the product…

Quasi-experiment: In social science research, particularly in the qualitative literature on program evaluation, the term "quasi-experiment" refers to studies that do not involve the application of treatments via random assignment of subjects. They are also called observational studies. A quasi-experiment (or observational study) does involve…

Bag-of-words: Bag-of-words is a simplified natural language processing concept. Text documents are parsed and output as collections of words (i.e. stripped of punctuation, etc.). In the bag-of-words concept, the resulting collection of words is considered for further analytics without regard to order, grammar, etc. (but…

Stemming: In processing unstructured text, stemming is the process of converting multiple forms of the same word into one stem, to simplify the task of analyzing the processed text. For example, in the previous sentence, "processing," "process," and "processed" would all be converted to the…

Structured vs. unstructured data: Structured data is data that is in a form that can be used to develop statistical or machine learning models (typically a matrix where rows are records and columns are variables or features). Or data that is in a form that…

Feature engineering: In predictive modeling, a key step is to turn available data (which may come from varied sources and be messy) into an orderly matrix of rows (records to be predicted) and columns (predictor variables or features). The feature engineering process involves review of…

Naive bayes classifier: A full Bayesian classifier is a supervised learning technique that assigns a class to a record by finding other records with attributes just like it has, and finding the most prevalent class among them. Naive Bayes (NB) recognizes that finding exact matches…

Node: A node is an entity in a network. In a social network, it would be a person. In a digital network, it would be a computer or device. Nodes can be of different types in the same network - a criminal network might contain…

Statistical Glossary k-Nearest neighbor: K-nearest-neighbor (K-NN) is a machine learning predictive algorithm that relies on calculation of distances between pairs of records. The algorithm is used in classification problems where training data are available with known target values. The algorithm takes each record and assigns…

A NoSQL database is distinguished mainly by what it is not - it is not a structured relational database format that links multiple separate tables. NoSQL stands for "not only SQL," meaning that SQL, or structured query language is not needed to extract and organize…

Predictive modeling is the process of using a statistical or machine learning model to predict the value of a target variable (e.g. default or no-default) on the basis of a series of predictor variables (e.g. income, house value, outstanding debt, etc.). Many of the techniques…

Directed vs. Undirected Network: In a directed network, connections between nodes are directional. For example, in a Twitter network, Smith might follow Jones but that does not mean that Jones follows Smith. Each directional relationship would have an edge to represent it, typically with an…

Regression Trees: Regularization refers to a wide variety of techniques used to bring structure to statistical models in the face of data size, complexity and sparseness. Advances in digital processing, storage and retrieval have led to huge and growing data sets ("Big Data"). Regularization is…

SQL: SQL stands for structured query language, a high level language for querying relational databases, extracting information. For example, SQL provides the syntax rules that can translate a query like this into a form that can be submitted to the database: "Find all sales of…

Markov Chain Monte Carlo (MCMC): A Markov chain is a probability system that governs transition among states or through successive events. For example, in the American game of baseball, the probability of reaching base differs depending on the "count" -- the number of balls and…

MapReduce In computer science, MapReduce is a procedure that prepares data for parallel processing on multiple computers. The "map" function sorts the data, and the "reduce" function generates frequencies of items. The combined overall system manages the parceling out of the data to multiple processors,…

Hadoop: As data processing requirements grew beyond the capacities of even large computers, distributed computing systems were developed to spread the load to multiple computers. Hadoop is a distributed computing system with two key features: (1) it is open source, and (2) it can use…

Curse of Dimensionality: The curse of dimensionality is the affliction caused by adding variables to multivariate data models. As variables are added, the data space becomes increasingly sparse, and classification and prediction models fail because the available data are insufficient to provide a useful model…

Data Product: A data product is a product or service whose value is derived from using algorithmic methods on data, and which in turn produces data to be used in the same product, or tangential data products. For example, at large web-based retail organizations like…

This term is used synonymously with attribute and variable, it is actually an independent variable (see dependent and independent variables). The term feature comes from the machine learning community, often in the phrase "feature selection" (which see). Browse Other Glossary Entries

Dendrogram: Statistical distance is a measure calculated between two records that are typically part of a larger dataset, where rows are records and columns are variables. To calculate Euclidean distance, one possible distance metric, the steps are: 1. [Typically done, but not always] Convert all…

Decision Trees: In the machine learning community, a decision tree is a branching set of rules used to classify a record, or predict a continuous value for a record. For example, one path in a tree modeling customer churn (abandonment of subscription) might look like…

Feature Selection: In predictive modeling, feature selection, also called variable selection, is the process (usually automated) of sorting through variables to retain variables that are likely to be informative in prediction, and discard or combine those that are redundant. “Features” is a term used by…

Bagging: In predictive modeling, bagging is an ensemble method that uses bootstrap replicates of the original training data to fit predictive models. For each record, the predictions from all available models are then averaged for the final prediction. For a classification problem, a majority vote…

Decile Lift: In predictive modeling, the goal is to make predictions about outcomes on a case-by-case basis: an insurance claim will be fraudulent or not, a tax return will be correct or in error, a subscriber will terminate a subscription or not, a customer will…

boosting: In predictive modeling, boosting is an iterative ensemble method that starts out by applying a classification algorithm and generating classifications. The classifications are then assessed, and a second round of model-fitting occurs in which the records classified incorrectly in the first round are given…

In predictive modeling, ensemble methods refer to the practice of taking multiple models and averaging their predictions. In the case of classification models, the average can be that of a probability score attached to the classification. Models can differ with respect to algorithms used (e.g.…

Bayes´ Theorem: Bayes theorem is a formula for revising a priori probabilities after receiving new information. The revised probabilities are called posterior probabilities. For example, consider the probability that you will develop a specific cancer in the next year. An estimate of this probability based…

Collinearity: In regression analysis , collinearity of two variables means that strong correlation exists between them, making it difficult or impossible to estimate their individual regression coefficients reliably. The extreme case of collinearity, where the variables are perfectly correlated, is called singularity . See also:…

Complete Statistic: A sufficient statistic T is called a complete statistic if no function of it has zero expected value for all distributions concerned unless this function itself is zero for all possible distributions concerned (except possibly a set of measure zero). The property of…

Covariate: In design of experiments, a covariate is an independent variable not manipulated by the experimenter but still affecting the response. See Variables (in design of experiments) for an explanatory example. Browse Other Glossary Entries

Cross-tabulation Tables: A cross-tabulation table represents the joint frequency distribution of two discrete variables. Rows and columns correspond to the possible values of the first and the second variables, the cells contain frequencies (numbers) of occurrence of the corresponding pairs of values of the 1st…

Cumulative Frequency Distribution: A cumulative frequency distribution is a summary of a set of data showing the frequency (or number) of items less than or equal to the upper class limit of each class. This definition holds for quantitative data and for categorical (qualitative) data…

Statistical Glossary Cumulative Relative Frequency Distribution: A cumulative relative frequency distribution is a tabular summary of a set of data showing the relative frequency of items less than or equal to the upper class class limit of each class. Relative frequency is the fraction or…

Event: In probability theory, an event is an outcome or defined collection of outcomes of a random experiment. Since the collection of all possible outcomes to a random experiment is called the sample space, another definiton of event is any subset of a sample space.…

Expected Value: The expected value of a random variable is nothing but the arithmetic mean. For a discrete random variable, the expected value is the weighted average of the possible values of the random variable, the weights being the probabilities that those values will occur.…

Fair Game: A game of chance is said to be fair if each player´s expected payoff is zero. A game in which I roll a die and receive 12 for a 1 or 2 and lose 6 otherwise (3-6) is a fair game. Browse Other…

Filter: A filter is an algorithm for processing a time series or random process . There are two major classes of problems solved by filters: 1. To estimate the current value of a time series (X(t), t = 1,2, ...) , which is not directly…

Finite Sample Space: If a sample space contains a finite number of elements, then the sample space is said to be a finite sample space. The sample space for the experiment of a toss of a coin is a finite sample space. It has only…

Frequency Distribution: A frequency distribution is a tabular summary of a set of data showing the frequency (or number) of items in each of several non-overlapping classes (or bins). This definition is applicable to both quantitative and categorical (qualitative) data. For quantitative data, the classes…

Frequency Interpretation of Probability: The frequency interpretation of probability is the most widely held of several ways of interpreting the meaning of the concept of "probability". According to this interpretation the probability of an event is the proportion of times the said event occurs when…

Granger Causation: Granger causation is a definition of causal relation between vectors in vector time series . Let us define Ht as the history up to and including the discrete time t , and denote Yt the random vector Y at time t . Granger…

Hazard Function: In medical statistics, the hazard function is a relationship between a proportion and time. The proportion (also called the hazard ratio) is the proportion of subjects who die in an increment of time starting at time "t" from among those who have survived…

Inferential Statistics: Inferential statistics is the body of statistical techniques that deal with the question "How reliable is the conclusion or estimate that we derive from a set of data?" The two main techniques are confidence intervals and hypothesis tests. Browse Other Glossary Entries

Interval Scale: An interval scale is a measurement scale in which a certain distance along the scale means the same thing no matter where on the scale you are, but where "0" on the scale does not represent the absence of the thing being measured.…

Jackknife: The jackknife is a general non-parametric method for estimation of the bias and variance of a statistic (which is usually an estimator) using only the sample itself. The jackknife is considered as the predecessor of the bootstrapping techniques. With a sample of size N,…

Level of a Factor: In design of experiments, levels of a factor are the values it takes on. The values are not necessarily numbers - they may be at a nominal scale, ordinal scale, etc. See Variables (in design of experiments) for an explanatory example.…

Likelihood Function: Likelihood function is a fundamental concept in statistical inference. It indicates how likely a particular population is to produce an observed sample. Let P(X; T) be the distribution of a random vector X, where T is the vector of parameters of the distribution.…

Statistical Glossary Local Independence: The local independence postulate plays a central role in latent variable models . Local independence means that all the manifest variable s are independent random variables if the latent variable s are controlled (fixed). Technically, the local independence may be described…

Manifest Variable: In latent variable models , a manifest variable (or indicator) is an observable variable - i.e. a variable that can be measured directly. A manifest variable can be continuous or categorical. The opposite concept is the latent variable . See also latent variable…

Marginal Density: If X and Y are continuous random variables, and f(x,y ) is the joint density of X and Y, then the marginal density of X, g(x), is given by Browse Other Glossary Entries

Marginal Distribution: If X and Y are discrete random variables and f(x,y) is their joint probability distribution, the marginal distribution of X, g(x) is given by Browse Other Glossary Entries

Meta-analysis: Meta-analysis takes the results of two or more studies of the same research question and combines them into a single analysis. The purpose of meta-analysis is to gain greater accuracy and statistical power by taking advantage of the large sample size resulting from the…

Minimax Decision Rule: A minimax decision rule has the smallest possible maximum risk. All other decision rules will have a higher maximum risk. Browse Other Glossary Entries

Multicollinearity: In regression analysis , multicollinearity refers to a situation of collinearity of independent variables, often involving more than two independent variables, or more than one pair of collinear variables. Multicollinearity means redundancy in the set of variables. This can render ineffective the numerical methods…

Multivariate: Multivariate analysis involves more than one variable of interest. Browse Other Glossary Entries

Neural Network: A neural network (NN) is a network of many simple processors ("units"), each possibly having a small amount of local memory. The units are connected by communication channels ("connections") which usually carry numeric (as opposed to symbolic) data, encoded by any of various…

Nominal Scale: A nominal scale is really a list of categories to which objects can be classified. For example, people who receive a mail order offer might be classified as "no response," "purchase and pay," "purchase but return the product," and "purchase and neither pay…

Normality: Normality is a property of a random variable that is distributed according to the normal distribution . Normality plays a central role in both theoretical and practical statistics: a great number of theoretical statistical methods rest on the assumption that the data, or test…

Ordinal Scale: An ordinal scale is a measurement scale that assigns values to objects based on their ranking with respect to one another. For example, a doctor might use a scale of 0-10 to indicate degree of improvement in some condition, from 0 (no improvement)…

Outlier: Sometimes a set of data will have one or more items with unusually large or unusually small values. Such extreme values are called outliers. Outliers often arise from some mistakes in data-gathering or data-recording procedures. It is good practice to inspect a data set…

Parameter: A Parameter is a numerical value that describes one of the characteristics of a probability distribution or population. For example, a binomial distribution is completely specified if the number of trials and probability of success are known. Here, the number of trials and the…

Transformation: Transformation is the conversion of a data set into a transformed data set by the application of a function. The statistical purpose of transformation is to produce a transformed data set that better conforms to the requirements of a statistical procedure. A typical use…

Truncation: Truncation, generally speaking, means to shorten. In statistics it can mean the process of limiting consideration or analysis to data that meet certain criteria (for example, the patients still alive at a certain point). Or it can refer to a data distribution where values…

Univariate: Univariate analysis involves a single variable of interest. Browse Other Glossary Entries

Variables (in design of experiments): Many statistical methods rest on a statistical model which states a relationship Y = f(X1,..,XN) between a dependent variable (Y) and independent variable(s) X1,...,XN. In designed experiments, the dependent variable is often named "response", independent variables manipulated by the experimenter…

Z score: An observation's z-score tells you the number of standard deviations it lies away from the population mean (and in which direction). The calculation is as follows: z = x - m s , where x is the observation itself, m is the mean…

Average Deviation: The average deviation or the average absolute deviation is a measure of dispersion. It is the average of absolute deviations of the individual values from the median or from the mean. Browse Other Glossary Entries

Coefficient of variation: The coefficient of variation is the standard deviation of a data set, divided by the mean of the same data set. Browse Other Glossary Entries

Statistical Glossary Column icon plots: See sequential icon plots . Browse Other Glossary Entries

Correlation Coefficient: The correlation coefficient indicates the degree of linear relationship between two variables. The correlation coefficient always lies between -1 and +1. -1 indicates perfect linear negative relationship between two variables, +1 indicates perfect positive linear relationship and 0 indicates lack of any linear…

Correlation Matrix: A Correlation matrix describes correlation among M variables. It is a square symmetrical MxM matrix with the (ij)th element equal to the correlation coefficient r_ij between the (i)th and the (j)th variable. The diagonal elements (correlations of variables with themselves) are always equal…

Correspondence Plot: A correspondence plot represents the results of correspondence analysis (CA). For each category (possible value of a variable), its scores derived by CA for the first two dimensions are depicted as a point on the x-y plane. An interesting feature of the correspondence…