Ordinary Least Squares Regression

Ordinary Least Squares Regression: Ordinary least squares regression is a special (and the most common) kind of ordinary linear regression . It is based on the least squares method of finding regression parameters. Technically, the aim of ordinary least squares regression is to find out…

Orthogonal Least Squares

Orthogonal Least Squares: In ordinary least squares, we try to minimize the sum of the vertical squared distances between the observed points and the fitted line. In orthogonal least squares, we try to fit a line which minimizes the sum of the squared distances between…

Precision

Precision: Precision is the degree of accuracy with which a parameter is estimated by an estimator. Precision is usually measured by the standard deviation of the estimator and is known as the standard error. For example, the sample mean is used to estimate the population…

Regression

Regression: See regression analysis. Browse Other Glossary Entries

Regression Analysis

Regression Analysis: Regression analysis provides a "best-fit" mathematical equation for the relationship between the dependent variable (response) and independent variable(s) (covariates). There are two major classes of regression - parametric and non-parametric. Parametric regression requires choice of the regression equation with one or a greater…

Residuals

Residuals: Residuals are differences between the observed values and the values predicted by some model. Analysis of residuals allows you to estimate the adequacy of a model for particular data; it is widely used in regression analysis . Browse Other Glossary Entries

Resistance

Statistical Glossary Resistance: Resistance, used with respect to sample estimators, refers to the sensitivity of the estimator to extreme observations. Estimators that do not change much with the addition of deletion of extreme observations are said to be resistant. The median is a resistant estimator…

Backward Elimination

Backward Elimination: Backward elimination is one of several computer-based iterative variable-selection procedures. It begins with a model containing all the independent variables of interest. Then, at each step the variable with smallest F-statistic is deleted (if the F is not higher than the chosen cutoff…

Simple Linear Regression

Simple Linear Regression: The simple linear regression is aimed at finding the "best-fit" values of two parameters - A and B in the following regression equation: Yi = A Xi + B + Ei,     i=1,¼,N where Yi, Xi, and Ei are the values of the…

Uplift or Persuasion Modeling

Uplift or Persuasion Modeling: A combination of treatment comparisons (e.g. send a sales solicitation, or send nothing) and predictive modeling to determine which cases or subjects respond (e.g. purchase or not) to which treatments. Here are the steps, in conceptual terms, for a typical uplift…

Step-wise Regression

Step-wise Regression: Step-wise regression is one of several computer-based iterative variable-selection procedures. Variables are added one-by-one based on their contribution to R-squared, but first, at each step we determine whether any of the variables (already included in the model) can be removed. If none of…

Sufficient Statistic

Sufficient Statistic: Suppose X is a random vector with probability distribution (or density) P(X | V), where V is a vector of parameters, and Xo is a realization of X. A statistic T(X) is called a sufficient statistic if the conditional probability (density) P(X |…

Variable-Selection Procedures

Variable-Selection Procedures: In regression analysis, variable-selection procedures are aimed at selecting a reduced set of the independent variables - the ones providing the best fit to the model. The criterion for selecting is usually the following F-statistic: F(x1,...,xp; xp+1) =  SSE(x1,...,xp) - SSE(x1,...,xp, xp+1) SSE(x1,...,xp)…

Alpha Spending Function

Alpha Spending Function: In the interim monitoring of clinical trials, multiple looks are taken at the accruing results. In such circumstances, akin to multiple testing, the alpha-value at each look must be adjusted in order to preserve the overall Type-1 Error. Alpha spending functions, (the…

Attribute

Attribute: In data analysis or data mining, an attribute is a characteristic or feature that is measured for each observation (record) and can vary from one observation to another. It might measured in continuous values (e.g. time spent on a web site), or in categorical…

Categorical Data

Categorical Data: Categorical data are reflecting the classification of objects into different categories. For example, people who receive a mail order offer might be classified as "no response," "purchase and pay," "purchase but return the product," and "purchase and neither pay nor return." Browse Other…

Cross sectional study

Cross sectional study: Cross sectional studies are those that record data from a sample of subjects at a given point in time. See also cross sectional data , longitudinal study . Browse Other Glossary Entries

Cross-sectional Analysis

Cross-sectional Analysis: Cross-sectional analysis is concerned with statistical inference from cross-sectional data . Browse Other Glossary Entries

Cross-sectional Data

Cross-sectional Data: Cross-sectional data refer to observations of many different individuals (subjects, objects) at a given time, each observation belonging to a different individual. A simple example of cross-sectional data is the gross annual income for each of 1000 randomly chosen households in New York…

Cohort data

Cohort data: Cohort data records multiple observations over time for a set of individuals or units tied together by some event (say, born in the same year). See also longitudinal data and panel data. Browse Other Glossary Entries

Crossover Design

Crossover Design: In randomized trials, a crossover design is one in which each subject receives each treatment, in succession. For example, subject 1 first receives treatment A, then treatment B, then treatment C. Subject 2 might receive treatment B, then treatment A, then treatment C.…

Effect

Effect: In design of experiments, the effect of a factor is an additive term of the model, reflecting the contribution of the factor to the response. See Variables (in design of experiments) for an explanatory example. Browse Other Glossary Entries

Effect Size

Effect Size: In a study or experiment with two groups (usually control and treatment), the investigator typically has in mind the magnitude of the difference between the two groups that he or she wants to be able to detect in a hypothesis test. This magnitude,…

Interim Monitoring

Interim Monitoring: In clinical trials of medical treatments or devices, a traditional fixed sample design establishes a fixed number of subjects or outcomes that must be observed. In a trial that uses interim monitoring, the sample size is not fixed in advance. Rather, periodic looks…

Latin Square

Latin Square: The Latin Square is a square array in which every letter or symbol appears exactly one in each row and in each column. B C D A C D A B D A B C A B C D Browse Other Glossary Entries

Longitudinal Analysis

Longitudinal Analysis: Longitudinal analysis is concerned with statistical inference from longitudinal data Browse Other Glossary Entries

Longitudinal Data

Longitudinal Data: Longitudinal data refer to observations of given units made over time. A simple example of longitudinal data is the gross annual income of, say, 1000 households from New York City for the years 1991-2000. See also: cross-sectional data , panel data , Cohort…

Longitudinal study

Longitudinal study: Longitudinal studies are those that record data for subjects or variables over time. If a longitudinal study uses the same subjects at each point where data are recorded, it is a panel study . If a longitudinal study samples from the same group…

Paired Replicates Data

Statistical Glossary Paired Replicates Data: Paired replicates is the simplest form of repeated measures data , when only two measurements are made for each experimental unit. Consider, for example, a study of 2 drugs - A and B - to determine whether they reduce arterial…

Panel Data

Panel Data: A panel data set contains observations on a number of units (e.g. subjects, objects) belonging to different clusters (panels) over time. A simple example of panel data is the values of the gross annual income for each of 1000 households in New York…

Panel study

Panel study: A panel study is a longitudinal study that selects a group of subjects then records data for each member of the group at various points in time. See also panel data . Browse Other Glossary Entries

Parallel Design

Parallel Design: In randomized trials, a parallel design is one in which subjects are randomly assigned to treatments, which then proceed in parallel with each group. Conducted properly, they provide assurance that any difference between treatments is in fact due to treatment effects (or random…

Repeated Measures Data

Repeated Measures Data: Repeated measures (or repeated measurements) data are usually obtained from multiple measurements of a response variable. Such multiple measurements are carried out for each experimental unit over time (as in a longitudinal study ) or under multiple conditions. An essential statistical peculiarity…

Response

Response: In design of experiments, response is a dependent variable. Its values are measured for all subjects, and the question of primary interest is how factors affect the response. See Variables (in design of experiments) for an explanatory example. Browse Other Glossary Entries

Sample Size Calculations

Sample Size Calculations: Sample size calculations typically arise in significance testing, in the following context: how big a sample size do I need to identify a significant difference of a certain size? The analyst must specify three things: 1) How big a difference is being…

Self-Controlled Design

Statistical Glossary Self-Controlled Design: In randomized trials, a self-controlled design is one in which results are measured in each subject before and after treatment. Both parallel designs and crossover designs can also include a self-controlled feature. Browse Other Glossary Entries

Sequential Analysis

Sequential Analysis: In sequential analysis, decisions about sample size and the type of data to be collected are made and modified as the study proceeds, incorporating information learned at earlier stages. One major application of sequential analysis is in clinical trials in medicine, where successful…

Stratified Sampling

Stratified Sampling: Stratified sampling is a method of random sampling. In stratified sampling, the population is first divided into homogeneous groups, also called strata. Then, elements from each stratum are selected at random according to one of the two ways: (i) the number of elements…

Systematic Sampling

Systematic Sampling: Systematic sampling is a method of random sampling. The elements to be sampled are selected at a uniform interval that is measured in time, order, or space. Browse Other Glossary Entries

Time-series data

Time-series data: See longitudinal data Browse Other Glossary Entries

Finite Mixture Models

Finite Mixture Models: Outside the social research, the term "finite mixture models" is often used as a synonym for "latent class models" in latent class analysis . Browse Other Glossary Entries

Fixed Effects

Fixed Effects: The term "fixed effects" (as contrasted with "random effects") is related to how particular coefficients in a model are treated - as fixed or random values. Which approach to choose depends on both the nature of the data and the objective of the…

General Linear Model

General Linear Model: General (or generalized) linear models (GLM), in contrast to linear models, allow you to describe both additive and non-additive relationship between a dependent variable and N independent variables. The independent variables in GLM may be continuous as well as discrete. (The dependent…

General Linear Model for a Latin Square

General Linear Model for a Latin Square: In design of experiment, a Latin square is a three-factor experiment in which for each pair of factors in any combination of factor values occurs only once. Consider the following Latin Square, B C D A C D…

Hierarchical Linear Modeling

Hierarchical Linear Modeling: Hierarchical linear modeling is an approach to analysis of hierarchical (nested) data - i.e. data represented by categories, sub-categories, ..., individual units (e.g. school -> classroom -> student). At the first stage, we choose a linear model (level 1 model) and fit…

Hierarchical Loglinear Models

Hierarchical Loglinear Models: Hierarchical linear modeling is an approach to analysis of hierarchical (nested) data - i.e. data represented by categories, sub-categories, ..., individual units (e.g. school -> classroom -> student). At the first stage, we choose a linear model (level 1 model) and fit…

Interaction effect

Interaction effect: An interaction effect refers to the role of a variable in an estimated model, and its effect on the dependent variable. A variable that has an interaction effect will have a different effect on the dependent variable, depending on the level of some…

Latent Structure Models

Latent Structure Models: Latent structure models is a generic term for a broad set of categories of statistical models. This set includes factor analysis models, covariance structure models, latent profile analysis models, latent trait analysis models, latent class analysis models, and some others. Each category…

Latent Variable Models

Latent Variable Models: Latent variable models are a broad subclass of latent structure models . They postulate some relationship between the statistical properties of observable variables (or "manifest variables", or "indicators") and latent variables. A special kind of statistical analysis corresponds to each kind of…

Linear Model

Linear Model: A linear model specifies a linear relationship between a dependent variable and n independent variables: y = a0 + a1 x1 + a2 x2 + ¼+ an xn, where y is the dependent variable, {xi} are independent variables, {ai} are parameters of the…

Logit and Probit Models

Logit and Probit Models: Logit and probit models postulate some relation (usually - a linear relation) between nonlinear functions of the observed probabilities and unknown parameters of the model. Logit and probit here are nonlinear functions of probability. See also: Logit Models , Probit Models…

Logit Models

Logit Models: Logit models postulate some relation between the logit of observed probabilities (not the probabilities themselves), and unknown parameters of the model. For example, logit models used in logistic regression postulate a linear relation between the logit and parameters of the model. The major…

Loglinear models

Loglinear models: Loglinear models are models that postulate a linear relationship between the independent variables and the logarithm of the dependent variable, for example: log(y) = a0 + a1 x1 + a2 x2 ... + aN xN where y is the dependent variable; xi, i=1,...,N…

Mixed Models

Mixed Models: In mixed effects models (or mixed random and fixed effects models) some coefficients are treated as fixed effects and some as random effects. See fixed effects for detailed explanations of the concepts "random effects" and "fixed effects". Browse Other Glossary Entries

Probit Models

Probit Models: Probit models postulate some relation between the probit of the observed probability, and unknown parameters of the model. The most common example is the model probit(p) = a + b x which is equivalent to : p = F(a + b x) where…

Proportional Hazard Model

Proportional Hazard Model: Proportional hazard model is a generic term for models (particularly survival models in medicine) that have the form L(t | x1, x2, ¼, xn) = h(t) exp(b1 x1 + ¼+ bn xn), where L is the hazard function or hazard rate, {xi}…

Random Effects

Random Effects: The term "random effects" (as contrasted with "fixed effects") is related to how particular coefficients in a model are treated - as fixed or random values. Which approach to choose depends on both the nature of the data and the objective of the…

Web Analytics

Web Analytics: Statistical or machine learning methods applied to web data such as page views, hits, clicks, and conversions (sales), generally with a view to learning what web presentations are most effective in achieving the organizational goal (usually sales). This goal might be to sell…

Network Analytics

Network Analytics: Network analytics is the science of describing and, especially, visualizing the connections among objects. The objects might be human, biological or physical. Graphical representation is a crucial part of the process; Wayne Zachary's classic 1977 network diagram of a karate club reveals the…

Structural Equation Modeling

Structural Equation Modeling: Structural equation modeling includes a broad range of multivariate analysis methods aimed at finding interrelations among the variables in linear models by examining variances and covariances of the variables. Path analysis , for example, is a method of structural equation modeling. Structural…

Vector Autoregressive Models

Vector Autoregressive Models: Vector autoregressive models describe statistical properties of vector time series . Vector autoregressive models generalize the models used in ordinary autoregression . Consider a vector time series : V(1), V(2), ... In general, vector autoregressive models assume the some functional relation between…

Analysis of Commonality

Analysis of Commonality: Analysis of commonality is a method for causal modeling . In a simple case of two independent variables x1 and x2 , for example, analysis of commonality posits three sources of causation, described by three latent variables: u1 and u2 , which…

Analysis of Covariance (ANCOVA)

Analysis of Covariance (ANCOVA): Analysis of covariance is a more sophisticated method of analysis of variance. It is based on inclusion of supplementary variables (covariates) into the model. This lets you account for inter-group variation associated not with the "treatment" itself, but with covariate(s). Suppose…

ANCOVA

ANCOVA: See Analysis of covariance Browse Other Glossary Entries

Canonical Correlation Analysis

Canonical Correlation Analysis: The purpose of canonical correlation analysis is to explain or summarize the relationship between two sets of variables by finding a linear combinations of each set of variables that yields the highest possible correlation between the composite variable for set A and…

Canonical root

Canonical root: See discriminant function . Browse Other Glossary Entries

Canonical variates analysis

Canonical variates analysis: Several techniques that seek to illuminate the ways in which sets of variables are related one another. The term refers to regression analysis , MANOVA , discriminant analysis , and, most often, to canonical correlation analysis . Browse Other Glossary Entries

Causal analysis

Causal analysis: See causal modeling . Browse Other Glossary Entries

Causal modeling

Causal modeling: Causal modeling is aimed at advancing reasonable hypotheses about underlying causal relationships between the dependent and independent variables. Consider for example a simple linear model: y = a0 + a1 x1 + a2 x2 + e where y is the dependent variable, x1…

Centroid

Centroid: The centroid of several continuous variables is the vector of means of those variables. The concept of centroid plays the same role, for example, in multiple analysis of variance (MANOVA) as the mean plays in analysis of variance (ANOVA) . Browse Other Glossary Entries

Cluster Analysis

Cluster Analysis: In multivariate analysis, cluster analysis refers to methods used to divide up objects into similar groups, or, more precisely, groups whose members are all close to one another on various dimensions being measured. In cluster analysis, one does not start with any apriori…

Contingency Tables Analysis

Contingency Tables Analysis: Contingency tables analysis is a central branch of categorical data analysis , and is focused on the analysis of data represented as contingency table s. This sort of analysis includes hypothesis testing as well estimation of model parameters, e.g. applying loglinear regression…

Correspondence analysis

Correspondence analysis: Correspondence analysis (CA) is an approach to representing categorical data in an Euclidean space, suitable for visual analysis. CA is often used where the data (in the form of a two-way continegency table) have many rows and/or columns and are not easy to…

Correspondence Factor Analysis

Correspondence Factor Analysis: Correspondence factor analysis is a synonym for Correspondence analysis . Browse Other Glossary Entries

Multiple analysis of variance (MANOVA)

Multiple analysis of variance (MANOVA): MANOVA is a technique which determines the effects of independent categorical variables on multiple continuous dependent variables. It is usually used to compare several groups with respect to multiple continuous variables. The main distinction between MANOVA and ANOVA is that…

Multiple Correspondence Analysis (MCA)

Multiple Correspondence Analysis (MCA): Multiple correspondence analysis (MCA) is an extension of correspondence analysis (CA) to the case of more than two variables. The initial data for MCA are three-way or m-way contingency tables. In case of three variables, a common approach to MCA is…

Partial correlation analysis

Partial correlation analysis: Partial correlation analysis is aimed at finding correlation between two variables after removing the effects of other variables. This type of analysis helps spot spurious correlations (i.e. correlations explained by the effect of other variables) as well as to reveal hidden correlations…

Path Analysis

Path Analysis: Path analysis is a method for causal modeling . Consider the simple case of two independent variables x1 and x2 and one dependent variable. Path analysis splits the contribution of x1 and x2 to the variance of the dependent variable y into four…

Perceptual Mapping

Statistical Glossary Perceptual Mapping: Perceptual mapping is a synonym for Correspondence analysis . Browse Other Glossary Entries

Principal Component Analysis

Principal Component Analysis: The purpose of principal component analysis is to derive a small number of linear combinations (principal components) of a set of variables that retain as much of the information in the original variables as possible. Browse Other Glossary Entries

Social Space Analysis

Social Space Analysis: Social space analysis is a synonym for Correspondence analysis . Browse Other Glossary Entries

Machine Learning

Machine Learning: Analytics in which computers "learn" from data to produce models or rules that apply to those data and to other similar data. Predictive modeling techniques such as neural nets, classification and regression trees (decision trees), naive Bayes, k-nearest neighbor, and support vector machines…

Multiplicity Issues

Multiplicity Issues: Multiplicity issues arise in a number of contexts, but they generally boil down to the same thing: repeated looks at a data set in different ways, until something "statistically significant" emerges. See multiple comparisons for how to handle multiple pairwise testing in conjunction…

False Discovery Rate

False Discovery Rate: A "discovery" is a hypothesis test that yields a statistically significant result. The false discovery rate is the proportion of discoveries that are, in reality, not significant (a Type-I error). The true false discovery rate is not known, since the true state…

Bernoulli Distribution

Bernoulli Distribution: A random variable x has a Bernoulli distribution with parameter 0 < p < 1 if P(x) = ìïí ïî 1-p, x=0 p, x=1 0, x Ï {0, 1} where P(A) is the probability of outcome A. The parameter p is often called…

Beta Distribution

Beta Distribution: Suppose x1, x2, ... , xn are n independent values of a random variable uniformly distributed within the interval [0,1]. If you sort the values in ascending order, then the k-th value will have a beta distribution with parameters a = k, b…

Binomial Distribution

Binomial Distribution: Used to describe an experiment, event, or process for which the probability of success is the same for each trial and each trial has only two possible outcomes. If a coin is tossed n number of times, the probability of a certain number…

Bivariate Normal Distribution

Bivariate Normal Distribution: Bivariate normal distribution describes the joint probability distribution of two variables, say X and Y, that both obey the normal distribution. The bivariate normal is completely specified by 5 parameters: mx, my are the mean values of variables X and Y, respectively;…

Central Limit Theorem

Central Limit Theorem: The central limit theorem states that the sampling distribution of the mean approaches Normality as the sample size increases, regardless of the probability distribution of the population from which the sample is drawn. If the population distribution is fairly Normally-distributed, this approach…

Chebyshev´s Theorem

Chebyshev´s Theorem: For any positive constant ´k´, the probability that a random variable will take on a value within k standard deviations of the mean is at least 1 - 1/k2 . Browse Other Glossary Entries

Chi-Square Distribution

Chi-Square Distribution: The square of a random variable having standard normal distribution is distributed as chi-square with 1 degree of freedom. The sum of squares of ´n´ independently distributed standard normal variables has a Chi-Square distribution with ´n´ degrees of freedom. The distribution is typically…

Conditional Probability

Conditional Probability: When probabilities are quoted without specification of the sample space, it could result in ambiguity when the sample space is not self-evident. To avoid this, the sample space can be explicitly made known. The probability of an event A given sample space S,…

Continuous Sample Space

Continuous Sample Space: If a sample space contains an infinite number of sample points constituting a continuum, then such a sample space is said to be a continuous sample space. Browse Other Glossary Entries

Close Menu