Internal Consistency Reliability

Statistical Glossary Internal Consistency Reliability: The internal consistency reliability of survey instruments (e.g. psychological tests), is a measure of reliability of different survey items intended to measure the same characteristic. For example, there are 5 different questions (items) related to anxiety level. Each question implies…

Hold-Out Sample

A hold-out sample is a random sample from a data set that is withheld and not used in the model fitting process. After the model is fit to the main data (the "training" data), it is then applied to the hold-out sample. This gives an…

Image Processing

Statistical Glossary Image Processing: In image processing, the initial data are images - functions of two coordinates. Normally, images are represented in discrete form as two-dimensional arrays of image elements, or "pixels" - i.e. sets of non-negative values , ordered by two indexes - (rows)…

Hierarchical Cluster Analysis

Hierarchical Cluster Analysis: Hierarchical cluster analysis (or hierarchical clustering) is a general approach to cluster analysis , in which the object is to group together objects or records that are "close" to one another. A key component of the analysis is repeated calculation of distance…

Harmonic Mean

Statistical Glossary Harmonic Mean: Harmonic mean is a measure of central location. The harmonic mean of positive values is defined by the formula Let the path between two cities and be divided into parts of equal length. One drives the th part at velocity .…

Geometric Distribution (Graphical)

Geometric Distribution: A random variable x obeys the geometric distribution with parameter p (0<p<1) if If a random variable obeys the Bernoulli distribution with probability of success p, then x might be the number of trials before the first "success" occurs. Browse Other Glossary Entries

Gini coefficient (Graphical)

Gini coefficient: The Gini coefficient is used in economics to measure income inequality. Generally speaking, it is used to measure the extent of departure from a perfectly even distribution of income. A "0" indicates no departure, i.e. everyone has the same income. A "1" indicates…

Gaussian Filter

Gaussian Filter: The Gaussian filter is a linear filter that is usually used as a smoother . The output of the gaussian filter at the moment is the weighted mean of the input values, and the weights are defined by formula where is the "distance"…

General Linear Model for a Latin Square (Graphical)

General Linear Model for a Latin Square: In design of experiment, a Latin square is a three-factor experiment in which for each pair of factors in any combination of factor values occurs only once. Consider the following Latin Square, where rows correspond to 4 values…

Gamma Distribution (Graphical)

Gamma Distribution: A random variable x is said to have a gamma-distribution with parameters a > 0 and l > 0 if its probability density p(x) is p(x) = ìïí ïî  la G(a) xa-1 e-lx, x > 0; 0, Browse Other Glossary Entries

Functional Data Analysis (FDA)

Functional Data Analysis (FDA): In functional data analysis (FDA), data are considered as continuous functions (or curves). This is in contrast to multivariate statistics, where data are considered as vectors (finite sets of values). Real data are usually collected as discrete samples. In FDA, such…

Farthest Neighbor Clustering

Farthest Neighbor Clustering: The farthest neighbor clustering is a synonym for complete linkage clustering . Browse Other Glossary Entries

Fourier Spectrum

Fourier Spectrum: Any continuous function defined on a finite interval of length can be represented as a weighted sum of cosine functions with periods : where is the frequency of the i-th Fourier component; is the amplitude of the i-th component; is the phase of…

Fleming Procedure

Fleming Procedure: Fleming procedure (or O´Brien-Fleming multiple testing procedure ) is a simple multiple testing procedure for comparing two treatments when the response to treatment is dichotomous . This procedure is used in clinical trials. The procedure provides an opportunity to terminate the trial early…

Fixed Effects (Graphical)

Fixed Effects: The term "fixed effects" (as contrasted with "random effects") is related to how particular coefficients in a model are treated - as fixed or random values. Which approach to choose depends on both the nature of the data and the objective of the…

F Distribution (Graphical)

F Distribution: The F distribution is a family of distributions differentiated by two parameters: m1 (degrees of freedom, numerator) and m2 (degrees of freedom, denominator). If x1 and x2 are independent random variables with a chi-square distribution with m1 and m2 degrees of freedom respectively,…

Family-wise Type I Error (Graphical)

Family-wise Type I Error: In multiple comparison procedures, family-wise type I error is the probability that, even if all samples come from the same population, you will wrongly conclude that at least one pair of populations differ. If is the probability of comparison-wise type I…

Face Validity

Face Validity: The face validity of survey instruments and tests used in psychometrics , is assessed by cursory review of the items (questions) by untrained individuals. The individuals make their judgments on whether the items are relevant. For example, a researcher developing an IQ-test might…

Exponential Distribution (Graphical)

Exponential Distribution: The exponential distribution is a one-sided distribution completely specified by one parameter ; the density of this distribution is The mean of the exponential distribution is . The exponential distribution is a model for the length of intervals between two consecutive random events…

Explanatory Variable

Explanatory Variable: Explanatory variable is a synonym for independent variable . See also: dependent and independent variables . Browse Other Glossary Entries

Exogenous Variable

Exogenous Variable: Exogenous variables in causal modeling are the variables with no causal links (arrows) leading to them from other variables in the model. In other words, exogenous variables have no explicit causes within the model. The concept of exogenous variable is fundamental in path…

Exponential Filter

Exponential Filter: The exponential filter is the simplest linear recursive filter . Exponential filters are widely used in time series analysis , especially for forecasting time series (see the short course Time Series Forecasting ). The exponential filter is described by the following expression: where…

Error

Error: Error is a general concept related to deviation of the estimated quantity from its true value: the greater the deviation, the greater the error. Errors are categorised according to their probabilistic nature into systematic errors and random errors , and, according to their relation…

Endogenous Variable

Endogenous Variable: Endogenous variables in causal modeling are the variables with causal links (arrows) leading to them from other variables in the model. In other words, endogenous variables have explicit causes within the model. The concept of endogenous variable is fundamental in path analysis and…

Data Partition

Data Partition: Data partitioning in data mining is the division of the whole data available into two or three non-overlapping sets: the training set , the validation set , and the test set . If the data set is very large, often only a portion…

Econometrics

Econometrics: Econometrics is a discipline concerned with the application of statistics and mathematics to various problems in economics and economic theory. This term literally means "economic measurement". A central task is quantification (measurement) of various qualitative concepts of economic theory - like demand , supply…

Divisive Methods (of Cluster Analysis)

Divisive Methods (of Cluster Analysis): In divisive methods of hierarchical cluster analysis , the clusters obtained at the previous step are subdivided into smaller clusters. Such methods start from a single cluster comprising of all N objects, and, after N-1 steps, they end with N…

Divergent Validity

Divergent Validity: In psychometrics , the divergent validity of a survey instrument, like an IQ-test, indicates that the results obtained by this instrument do not correlate too strongly with measurements of a similar but distinct trait. For example, if a test is supposed to measure…

Dispersion (Measures of)

Dispersion (Measures of): Measures of dispersion express quantitatively the degree of variation or dispersion of values in a population or in a sample . Along with measures of central tendency , measures of dispersion are widely used in practice as descriptive statistics . Some measures…

Discrete Distribution

Discrete Distribution: A discrete distribution describes the probabilistic properties of a random variable that takes on a set of values that are discrete, i.e. separate and distinct from one another - a discrete random variable . Discrete values are separated only by a finite number…

Design of Experiments

Design of Experiments: Design of experiments is concerned with optimization of the plan of experimental studies. The goal is to improve the quality of the decision that is made from the outcome of the study on the basis of statistical methods, and to ensure that…

Dichotomous

Dichotomous: Dichotomous (outcome or variable) means "having only two possible values", e.g. "yes/no", "male/female", "head/tail", "age > 35 / age <= 35" etc. For example, the outcome of an experiment with coin tossing is dichotomous ("head" or "tail"); the variable "biological sex" in a social…

Dependent and Independent Variables

Dependent and Independent Variables: Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called independent variables. While analysts typically specify variables in a model to reflect their understanding or theory of "what causes what," setting…

Data Mining

Data Mining: Data mining is concerned with finding latent patterns in large data bases. The goal is to discover unsuspected relationships that are of practical importance, e.g., in business. A broad range of statistical and machine learning approaches are used in data mining. See, for…

Dendrogram

Dendrogram: The dendrogram is a graphical representation of the results of hierarchical cluster analysis . This is a tree-like plot where each step of hierarchical clustering is represented as a fusion of two branches of the tree into a single one. The branches represent clusters…

Cover time

Cover time: Cover time is the expected number of steps in a random walk required to visit all the vertices of a connected graph (a graph in which there is always a path, consisting of one or more edges, between any two vertices). Blom, Holst…

Density (of Probability)

Density Functions: A probability density function or curve is a non-negative function ( ) that describes the distribution of a continuous random variable . If is known, then the probability that a value of the variable is within an interval is described by the following…

Cross-Validation

Cross-Validation: Cross-validation is a general computer-intensive approach used in estimating the accuracy of statistical models. The idea of cross-validation is to split the data into N subsets, to put one subset aside, to estimate parameters of the model from the remaining N-1 subsets, and to…

Clustered Sampling

Clustered Sampling: Clustered sampling is a sampling technique based on dividing the whole population into groups ("clusters"), then using random sampling to select elements from the groups. For example, if the target population is the whole population of a city, a researcher might select 100…

Data

Data: Data are recorded observations made on people, objects, or other things that can be counted, measured, or quantified in some way. In statistics, data are categorized according to several criteria, for example, according to the type of the values used to quantify the observations…

Criterion Validity

Criterion Validity: The criterion validity of survey instruments, like the tests used in psychometrics , is a measure of agreement between the results obtained by the given survey instrument and more "objective" results for the same population. The "objective" results are obtained either by a…

Convergent Validity

Convergent Validity: In psychometrics , the convergent validity of a survey instrument or psychometric test indicates the degree of agreement between measurements of the same trait obtained by different approaches supposed to measure the same trait. The complementary concept is the divergent validity . Both…

Loss Function

A loss function specifies a penalty for an incorrect estimate from a statistical model. Typical loss functions might specify the penalty as a function of the difference between the estimate and the true value, or simply as a binary value depending on whether the estimate…

Close Menu