Construct Validity

Construct Validity: In psychometrics , the construct validity of a survey instrument or psychometric test measures how well the instrument performs in practice from the standpoint of the specialists who use it. In psychology, a construct is a phenomenon or a variable in a model…

Content Validity

Content Validity: The content validity of survey instruments, like psychological tests, is assessed by overview of the items by trained individuals and/or by the individuals from the target population. The individuals make their judgments about the relevance of the items and about the unambiguity of…

Contingency Table

Contingency Table: A contingency table is a tabular representation of categorical data . A contingency table usually shows frequencies for particular combinations of values of two discrete random variable s X and Y. Each cell in the table represents a mutually exclusive combination of X-Y…

Contingency Tables Analysis

Contingency Tables Analysis: Contingency tables analysis is a central branch of categorical data analysis , and is focused on the analysis of data represented as contingency table s. This sort of analysis includes hypothesis testing as well estimation of model parameters, e.g. applying loglinear regression…

Continuous Distribution

Continuous Distribution: A continuous distribution describes probabilistic properties of a random variable which takes on a continuous (not countable) set of values - a continuous random variable . In contrast to discrete distributions , continuous distributions do not ascribe values of probability to possible values…

Continuous Random Variable

Continuous Random Variable: A continuous random variable is any random variable which takes on values on a continuous scale.

Continuous Sample Space

Continuous Sample Space: If a sample space contains an infinite number of sample points constituting a continuum, then such a sample space is said to be a continuous sample space.

Continuous vs. Discrete Distributions

Control Charts: A discrete distribution is one in which the data can only take on certain values, for example integers. A continuous distribution is one in which data can take on any value within a specified range (which may be infinite). For a discrete distribution,…

Control Charts

Control Charts: Control charts are used to track regular measurements of an ongoing process, and to signal when such a process had reached the point of going "out of control" (i.e. may no longer be governed by the same properties, such as mean or standard…

Convergent Validity

Convergent Validity: In psychometrics , the convergent validity of a survey instrument or psychometric test indicates the degree of agreement between measurements of the same trait obtained by different approaches supposed to measure the same trait. The complementary concept is the divergent validity . Both…

Convolution of Distribution Functions

Convolution of Distribution Functions: If F1(·) and F1(·) are distribution functions, then the function F(·) F(x) = óõ F1(x-y) dF2(y) is called the convolution of distribution functions F1 and F2. This is often denoted as F = F1 *F2. The convolution F1 *F2 provides the…

Convolution of Distribution Functions (Graphical)

Convolution of Distribution Functions: If F1(·) and F1(·) are distribution functions, then the function F(·) is called the convolution of distribution functions F1 and F2. This is often denoted as . The convolution provides the distribution function of the sum of two independent random variables…

Correlation Coefficient

Correlation Coefficient: The correlation coefficient indicates the degree of linear relationship between two variables. The correlation coefficient always lies between -1 and +1. -1 indicates perfect linear negative relationship between two variables, +1 indicates perfect positive linear relationship and 0 indicates lack of any linear…

Correlation Matrix

Correlation Matrix: A Correlation matrix describes correlation among M variables. It is a square symmetrical MxM matrix with the (ij)th element equal to the correlation coefficient r_ij between the (i)th and the (j)th variable. The diagonal elements (correlations of variables with themselves) are always equal…

Correlation Statistic

Correlation Statistic: The correlation statistic is one of the statistics used in the generalized Cochran-Mantel-Haenszel tests . It is applicable when both the treatment (rows) and response (columns) are measured on an ordinal scale . In case of independence between the two variables in all…

Correspondence analysis

Correspondence analysis: Correspondence analysis (CA) is an approach to representing categorical data in an Euclidean space, suitable for visual analysis. CA is often used where the data (in the form of a two-way continegency table) have many rows and/or columns and are not easy to…

Correspondence mapping

Correspondence mapping: Correspondence mapping is a synonym for Correspondence analysis .

Correspondence Plot

Correspondence Plot: A correspondence plot represents the results of correspondence analysis (CA). For each category (possible value of a variable), its scores derived by CA for the first two dimensions are depicted as a point on the x-y plane. An interesting feature of the correspondence…

Countable Sample Space

Countable Sample Space: If a sample space contains finite or countably infinite number of sample points then such a sample space is referred to as a countable sample space.

Covariance

Covariance: The covariance between two random variables X and Y is the expected value of the product of the variables´ deviations from their means. If there is a high probability that large values of X go with large values of Y and small values of…

Covariate

Covariate: In design of experiments, a covariate is an independent variable not manipulated by the experimenter but still affecting the response. See Variables (in design of experiments) for an explanatory example.

Cover time

Cover time: Cover time is the expected number of steps in a random walk required to visit all the vertices of a connected graph (a graph in which there is always a path, consisting of one or more edges, between any two vertices). Blom, Holst…

Cramer – Rao Inequality

Cramer - Rao Inequality: Every unbiased estimator has a variance greater than or equal to a lower bound called the Cramer - Rao lower bound. If the variance of an unbiased estimator achieves the Cramer - Rao lower bound, then that estimator is a minimum…

Criterion Validity

Criterion Validity: The criterion validity of survey instruments, like the tests used in psychometrics , is a measure of agreement between the results obtained by the given survey instrument and more "objective" results for the same population. The "objective" results are obtained either by a…

Cross sectional study

Cross sectional study: Cross sectional studies are those that record data from a sample of subjects at a given point in time. See also cross sectional data , longitudinal study .

Cross-sectional Analysis

Cross-sectional Analysis: Cross-sectional analysis is concerned with statistical inference from cross-sectional data .

Cross-sectional Data

Cross-sectional Data: Cross-sectional data refer to observations of many different individuals (subjects, objects) at a given time, each observation belonging to a different individual. A simple example of cross-sectional data is the gross annual income for each of 1000 randomly chosen households in New York…

Cross-tabulation Tables

Cross-tabulation Tables: A cross-tabulation table represents the joint frequency distribution of two discrete variables. Rows and columns correspond to the possible values of the first and the second variables, the cells contain frequencies (numbers) of occurrence of the corresponding pairs of values of the 1st…

Cross-Validation

Cross-Validation: Cross-validation is a general computer-intensive approach used in estimating the accuracy of statistical models. The idea of cross-validation is to split the data into N subsets, to put one subset aside, to estimate parameters of the model from the remaining N-1 subsets, and to…

Crossover Design

Crossover Design: In randomized trials, a crossover design is one in which each subject receives each treatment, in succession. For example, subject 1 first receives treatment A, then treatment B, then treatment C. Subject 2 might receive treatment B, then treatment A, then treatment C.…

Cumulative Frequency Distribution

Cumulative Frequency Distribution: A cumulative frequency distribution is a summary of a set of data showing the frequency (or number) of items less than or equal to the upper class limit of each class. This definition holds for quantitative data and for categorical (qualitative) data…

Cumulative Relative Frequency Distribution

Statistical Glossary Cumulative Relative Frequency Distribution: A cumulative relative frequency distribution is a tabular summary of a set of data showing the relative frequency of items less than or equal to the upper class class limit of each class. Relative frequency is the fraction or…

Curb-stoning

Curb-stoning: In survey research, curb-stoning refers to the deliberate fabrication of survey interview data by the interviewer. Often this is done to avoid the work of actually conducting the surveys. Statistical methods have been developed that can help to identify data that is the product…

Curse of Dimensionality

Curse of Dimensionality: The curse of dimensionality is the affliction caused by adding variables to multivariate data models. As variables are added, the data space becomes increasingly sparse, and classification and prediction models fail because the available data are insufficient to provide a useful model…

Data

Data: Data are recorded observations made on people, objects, or other things that can be counted, measured, or quantified in some way. In statistics, data are categorized according to several criteria, for example, according to the type of the values used to quantify the observations…

Data Mining

Data Mining: Data mining is concerned with finding latent patterns in large data bases. The goal is to discover unsuspected relationships that are of practical importance, e.g., in business. A broad range of statistical and machine learning approaches are used in data mining. See, for…

Data Partition

Data Partition: Data partitioning in data mining is the division of the whole data available into two or three non-overlapping sets: the training set , the validation set , and the test set . If the data set is very large, often only a portion…

Data Product

Data Product: A data product is a product or service whose value is derived from using algorithmic methods on data, and which in turn produces data to be used in the same product, or tangential data products. For example, at large web-based retail organizations like…

Decile

Decile: Deciles are percentile s taken in tens. The first decile is the 10th percentile, the second decile is the 20th percentile, etc.

Decile Lift

Decile Lift: In predictive modeling, the goal is to make predictions about outcomes on a case-by-case basis: an insurance claim will be fraudulent or not, a tax return will be correct or in error, a subscriber will terminate a subscription or not, a customer will…

Decision Trees

Decision Trees: In the machine learning community, a decision tree is a branching set of rules used to classify a record, or predict a continuous value for a record. For example, one path in a tree modeling customer churn (abandonment of subscription) might look like…

Deep Learning

Deep Learning refers to complex multi-layer neural nets.  They are especially suitable for image and voice recognition, and for unsupervised tasks with complex, unstructured data.

Degrees of Freedom

Degrees of Freedom: For a set of data points in a given situation (e.g. with mean or other parameter specified, or not), degrees of freedom is the minimal number of values which should be specified to determine all the data points. For example, if you…

Dendrogram

Dendrogram: The dendrogram is a graphical representation of the results of hierarchical cluster analysis . This is a tree-like plot where each step of hierarchical clustering is represented as a fusion of two branches of the tree into a single one. The branches represent clusters…

Density (of Probability)

Density Functions: A probability density function or curve is a non-negative function ( ) that describes the distribution of a continuous random variable . If is known, then the probability that a value of the variable is within an interval is described by the following…

Dependent and Independent Variables

Dependent and Independent Variables: Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called independent variables. While analysts typically specify variables in a model to reflect their understanding or theory of "what causes what," setting…

Descriptive Statistics

Descriptive Statistics: Descriptive statistics refers to statistical techniques used to summarize and describe a data set, and also to the statistics (measures) used in such summaries. Measures of central tendency (e.g. mean, median) and variation (e.g. range, standard deviation) are the main descriptive statistics. Displays…

Design of Experiments

Design of Experiments: Design of experiments is concerned with optimization of the plan of experimental studies. The goal is to improve the quality of the decision that is made from the outcome of the study on the basis of statistical methods, and to ensure that…

Detrended Correspondence Analysis

Detrended Correspondence Analysis: Detrended correspondence analysis is an extension of correspondence analysis (CA) aimed at addressing a deficiency of correspondence analysis . The problem is known as the "arch effect" - a non-monotonic relationship between two sets of scores derived by CA. The basic idea…

Dichotomous

Dichotomous: Dichotomous (outcome or variable) means "having only two possible values", e.g. "yes/no", "male/female", "head/tail", "age > 35 / age <= 35" etc. For example, the outcome of an experiment with coin tossing is dichotomous ("head" or "tail"); the variable "biological sex" in a social…

Differencing (of Time Series)

Differencing (of Time Series): Differencing of a time series in discrete time is the transformation of the series to a new time series where the values are the differences between consecutive values of . This procedure may be applied consecutively more than once, giving rise…

Directed vs. Undirected Network

Directed vs. Undirected Network: In a directed network, connections between nodes are directional. For example, in a Twitter network, Smith might follow Jones but that does not mean that Jones follows Smith. Each directional relationship would have an edge to represent it, typically with an…

Discrete Distribution

Discrete Distribution: A discrete distribution describes the probabilistic properties of a random variable that takes on a set of values that are discrete, i.e. separate and distinct from one another - a discrete random variable . Discrete values are separated only by a finite number…

Discrete Random Variable

Discrete Random Variable: A random variable whose range of possible values is finite or countably infinite is said to be a discrete random variable.

Discriminant Analysis

Discriminant Analysis: Discriminant analysis is a method of distinguishing between classes of objects. The objects are typically represented as rows in a matrix. The values of various attributes (variables) of an object are measured (the matrix columns) and a linear classification function is developed that…

Discriminant Function

Discriminant Function: In discriminant analysis , a discriminant function (DF) maps independent (discriminating) variables into a latent variable D. DF is usually postulated to be a linear function: D = a0 + a1 x1 + a2 x2 ... aN xN The goal of discriminant analysis…

Dispersion (Measures of)

Dispersion (Measures of): Measures of dispersion express quantitatively the degree of variation or dispersion of values in a population or in a sample . Along with measures of central tendency , measures of dispersion are widely used in practice as descriptive statistics . Some measures…

Dissimilarity Matrix

Dissimilarity Matrix: The dissimilarity matrix (also called distance matrix) describes pairwise distinction between M objects. It is a square symmetrical MxM matrix with the (ij)th element equal to the value of a chosen measure of distinction between the (i)th and the (j)th object. The diagonal…

Distance

Dendrogram: Statistical distance is a measure calculated between two records that are typically part of a larger dataset, where rows are records and columns are variables. To calculate Euclidean distance, one possible distance metric, the steps are: 1. [Typically done, but not always] Convert all…

Distance Matrix

Distance Matrix: Distance matrix is often used as a synonym for dissimilarity matrix . The "distance" does not necessarily means distance in space. It is a common situation when the "distance" is a subjective measure of dissimilarity. The only property the concept of "distance" implies…

Divergent Validity

Divergent Validity: In psychometrics , the divergent validity of a survey instrument, like an IQ-test, indicates that the results obtained by this instrument do not correlate too strongly with measurements of a similar but distinct trait. For example, if a test is supposed to measure…

Divisive Methods (of Cluster Analysis)

Divisive Methods (of Cluster Analysis): In divisive methods of hierarchical cluster analysis , the clusters obtained at the previous step are subdivided into smaller clusters. Such methods start from a single cluster comprising of all N objects, and, after N-1 steps, they end with N…

Dual Scaling

Dual Scaling: Dual scaling is a synonym for Correspondence analysis .

Dunn Test

Dunn Test: The Dunn test is a method for multiple comparison s, which generalizes the Bonferroni adjustment procedure. This test is used as a post-hoc test in analysis of variance when the number of comparisons is not large, when compared to the number of all…

Econometrics

Econometrics: Econometrics is a discipline concerned with the application of statistics and mathematics to various problems in economics and economic theory. This term literally means "economic measurement". A central task is quantification (measurement) of various qualitative concepts of economic theory - like demand , supply…

Edge

Effect: An edge is a link between two people or entities in a network. Edges can be directed or undirected. A directed edge has a clear origin and destination: lender > borrower, tweeter > follower. An undirected edge connects two people or entities with a…

Effect

Effect: In design of experiments, the effect of a factor is an additive term of the model, reflecting the contribution of the factor to the response. See Variables (in design of experiments) for an explanatory example.

Effect Size

Effect Size: In a study or experiment with two groups (usually control and treatment), the investigator typically has in mind the magnitude of the difference between the two groups that he or she wants to be able to detect in a hypothesis test. This magnitude,…

Efficiency

Efficiency: For an unbiased estimator, efficiency indicates how much its precision is lower than the theoretical limit of precision provided by the Cramer-Rao inequality. A measure of efficiency is the ratio of the theoretically minimal variance to the actual variance of the estimator. This measure…

Endogenous Variable

Endogenous Variable: Endogenous variables in causal modeling are the variables with causal links (arrows) leading to them from other variables in the model. In other words, endogenous variables have explicit causes within the model. The concept of endogenous variable is fundamental in path analysis and…

Ensemble Methods

In predictive modeling, ensemble methods refer to the practice of taking multiple models and averaging their predictions. In the case of classification models, the average can be that of a probability score attached to the classification. Models can differ with respect to algorithms used (e.g.…

Erlang Distribution

Erlang Distribution: The Erlang distribution with parameters (n, m) characterizes the distribution of time intervals until the emergence of n events in a Poisson process with parameter m . The Erlang distribution is a special case of the gamma distribution .

Error

Error: Error is a general concept related to deviation of the estimated quantity from its true value: the greater the deviation, the greater the error. Errors are categorised according to their probabilistic nature into systematic errors and random errors , and, according to their relation…

Estimation

Estimation: Estimation is deriving a guess about the actual value of a population parameter (or parameters) from a sample drawn from this population. See also Estimator.

Estimator

Estimator: A statistic, measure, or model, applied to a sample, intended to estimate some parameter of the population that the sample came from.

Event

Event: In probability theory, an event is an outcome or defined collection of outcomes of a random experiment. Since the collection of all possible outcomes to a random experiment is called the sample space, another definiton of event is any subset of a sample space.…

Exact Tests

Exact Tests: Exact tests are hypothesis tests that are guaranteed to produce Type-I error at or below the nominal alpha level of the test when conducted on samples drawn from a null model. For example, a test conducted at the 5% level of significance that…

Exogenous Variable

Exogenous Variable: Exogenous variables in causal modeling are the variables with no causal links (arrows) leading to them from other variables in the model. In other words, exogenous variables have no explicit causes within the model. The concept of exogenous variable is fundamental in path…

Expected Value

Expected Value: The expected value of a random variable is nothing but the arithmetic mean. For a discrete random variable, the expected value is the weighted average of the possible values of the random variable, the weights being the probabilities that those values will occur.…

Experiment

Experiment: Any process of observation or measurement is called an experiment in statistics. For example, counting the number people visiting a restaurant in a day is an experiment, and so is checking the number obtained on the roll of a die. Typically, we will be…

Explanatory Variable

Explanatory Variable: Explanatory variable is a synonym for independent variable . See also: dependent and independent variables .

Exponential Distribution

Exponential Distribution: The exponential distribution is a one-sided distribution completely specified by one parameter r > 0; the density of this distribution is f(x) = ìí î re-rx, x ³ 0 0, x < 0 The mean of the exponential distribution is 1/r. The exponential…

Exponential Distribution (Graphical)

Exponential Distribution: The exponential distribution is a one-sided distribution completely specified by one parameter ; the density of this distribution is The mean of the exponential distribution is . The exponential distribution is a model for the length of intervals between two consecutive random events…

Exponential Filter

Exponential Filter: The exponential filter is the simplest linear recursive filter . Exponential filters are widely used in time series analysis , especially for forecasting time series (see the short course Time Series Forecasting ). The exponential filter is described by the following expression: where…

F Distribution

F Distribution: The F distribution is a family of distributions differentiated by two parameters: m1 (degrees of freedom, numerator) and m2 (degrees of freedom, denominator). If x1 and x2 are independent random variables with a chi-square distribution with m1 and m2 degrees of freedom respectively,…

F Distribution (Graphical)

F Distribution: The F distribution is a family of distributions differentiated by two parameters: m1 (degrees of freedom, numerator) and m2 (degrees of freedom, denominator). If x1 and x2 are independent random variables with a chi-square distribution with m1 and m2 degrees of freedom respectively,…

Face Validity

Face Validity: The face validity of survey instruments and tests used in psychometrics , is assessed by cursory review of the items (questions) by untrained individuals. The individuals make their judgments on whether the items are relevant. For example, a researcher developing an IQ-test might…

Factor

Factor: In design of experiments, factor is an independent variable manipulated by the experimenter. See Variables (in design of experiments) for an explanatory example.

Factor Analysis

Factor Analysis: Exploratory research on a topic may identify many variables of possible interest, so many that their sheer number can become a hindrance to effective and efficient analysis. Factor analysis is a "data reduction" technique that reduce the number of variables studied to a…

Factorial ANOVA

Factorial ANOVA: Factorial ANOVA (factorial analysis of variance ) is aimed at assessing the relative importance of various combinations of independent variables. Factorial ANOVA is used when there are at least two independent variables.

Fair Game

Fair Game: A game of chance is said to be fair if each player´s expected payoff is zero. A game in which I roll a die and receive 12 for a 1 or 2 and lose 6 otherwise (3-6) is a fair game. Other…

Close Menu