#### Spectral Analysis

Spectral Analysis: Spectral analysis is concerned with estimation of the spectrum of a stationary random process or a stationary time series from the observed realization(s) of the process (or series). Methods and concepts of spectral analysis play an important role in time series analysis and…

#### Spectrum

Spectrum: See Fourier spectrum and power spectrum .

#### Spatial Field

Spatial Field: A spatial field is a function of spatial variables , or in 3D cases. A spatial field is named a "scalar field" if the function takes on scalar values. For example, the concentration of a toxic substance in the soil at points with…

#### Smoothing

Smoothing: Smoothing is a class of time series processing which is intended to reduce noise and to preserve the signal itself. The origin of this term is related to the visual appearance of the time series - it looks smoother after this sort of processing…

#### Sampling Frame

Sampling Frame: Sampling frame (synonyms: "sample frame", "survey frame") is the actual set of units from which a sample has been drawn: in the case of a simple random sample, all units from the sampling frame have an equal chance to be drawn and to…

#### Smoother (Smoothing Filter)

Smoother (Smoothing Filter): Smoothers, or smoothing filters, are algorithms for time-series processing that reduce abrupt changes in the time-series and make it look smoother. Smoothers constitute a broad subclass of filters. Like all filters, smoothers may be subdivided into linear and nonlinear. Linear filters reduce…

#### Smoother (Example)

Smoother (Example): A simple example of a smoother is the moving average procedure. It is based on averaging elements closest in time to the current time. Mathematically this can be expressed by the following simple formula: where is the input of the smoother, the original…

#### Social Network Analytics

Social Network Analytics: Network analytics applied to connections among humans. Recently it has come also to encompass the analysis of web sites and internet services like Facebook.

Single Linkage Clustering: The single linkage clustering method (or the nearest neighbor method) is a method of calculating distance between clusters in hierarchical cluster analysis . The linkage function specifying the distance between two clusters is computed as the minimal object-to-object distance , where objects…

#### Simple Linear Regression (Graphical)

Simple Linear Regression: The simple linear regression is aimed at finding the "best-fit" values of two parameters - A and B in the following regression equation: where Yi, Xi, and Ei are the values of the dependent variable, of the independent variable, and of the…

#### Signal

Signal: The signal is the component of the observed data (e.g. of a time series ) that carries useful information. The complementary (opposite) concept is noise . In a narrower sense (e.g. in signal processing ) signals are functions of time, as opposed to fields…

#### Signal Processing

Signal Processing: Signal processing is a branch of applied statistics concerned with analysis of functions of time that take on scalar or vector values. The functions are normally mixtures of a signal and a noise . A broad range of topics are considered in signal…

#### Shift Invariance (of Measures)

Shift Invariance (of Measures): Shift invariance is a property of descriptive statistics . If a statistic is shift-invariant, it possesses the following property for any data set : or, in equivalent form In other words, if a statistic is shift-invariant, then addition of an arbitrary…

#### Seemingly Unrelated Regressions (SUR)

Seemingly Unrelated Regressions (SUR): Seemingly unrelated regressions (SUR) is a class of multivariate regression ( multiple regression ) models, normally belonging to the sub-class of linear regression models. A distinctive feature of SUR models is that they consist of several unrelated systems of…

#### Seasonal Decomposition

Seasonal Decomposition: The seasonal decomposition is a method used in time series analysis to represent a time series as a sum (or, sometimes, a product) of three components - the linear trend, the periodic (seasonal) component, and random residuals. The seasonal decomposition is useful in…

Seasonal Adjustment: The seasonal adjustment is used in time series analysis to remove a periodic component with the known period from the observed time series. This adjustment is normally performed through the seasonal decomposition of the time series followed by subtraction of the seasonal component…

#### Scale Invariance (of Measures)

Scale Invariance (of Measures): Scale invariance is a property of descriptive statistics . If a statistic is scale-invariant, it has the following property for any sample and any non-negative value : (1) or, in mathematically equivalent form In other words, if a statistic…

#### Sample Survey

Sample Survey: In a sample survey , a sample of units drawn from the population of interest is analyzed. A related concept is the census survey . The main advantage of the sample survey (as compared to the census survey ) is that its implementation…

#### Statistical Significance

Statistical Significance: Outcomes to an experiment or repeated events are statistically significant if they differ from what chance variation might produce. For example - suppose n people are given a medication. If their response to the medication lies outside the range of how samples of…

#### Sampling

Sampling: Sampling is a process of drawing a sample from a population . Sampling may be performed from both real and hypothetical populations. Examples of sampling from a real population are opinion polls (when a finite number of individuals is chosen from a much bigger…

#### Robust Filter

Robust Filter: A robust filter is a filter that is not sensitive to input noise values with extremely large magnitude (e.g. those arising due to anomalous measurement errors. The median filter is an example of a robust filter. Linear filters are not robust…

#### Root Mean Square (Graphical)

Root Mean Square: Root mean square (RMS) of a set of values xi, i=1,...N is the square root of the mean of the squares of the values: RMS is a statistical measure of departure from the null value.

#### Random Numbers

Random Numbers: Random numbers are the numbers produced by a truly random mechanism (in contrast to pseudo-random numbers ). For example, random numbers with a good degree of randomness may be produced by tossing a coin, recording "0" or "1" (instead of "head" or "tail"),…

#### Reproducibility

Reproducibility: Reproducibility is the variation of outcomes of an experiment carried out in conditions varying within a typical range, e.g. when measurement is carried out by the same device by different operators, in different laboratories, etc. For example, reproducibility of measurements of mechanical scales is…

#### Replication

Replication: In statistics, replication is repetition of an experiment or observation in the same or similar conditions. Replication is important because it adds information about the reliability of the conclusions or estimates to be drawn from the data. The statistical methods that assess that reliability…

#### Repeatability

Repeatability: Repeatability is the variation of outcomes of an experiment carried out in the same conditions, e.g. by the same operator, in the same laboratory. For example, repeatability of measurements of precise mechanical scales is the variation of weight values reported for a given constant…

#### Replicate

Replicate: A replicate is the outcome of an experiment or observation obtained in course of its replication . In applied statistics, a set of replicates obtained in a series of replications of the experiment or observations is considered as a sample from a much bigger…

#### Reliability (in Survey Analysis)

Reliability (in Survey Analysis): In survey analysis, e.g. in psychometrics , reliability is a measure of reproducibility of the survey instrument or test. In other words, reliability is a measure of precision - i.e. it describes the random error of the survey instrument. There are…

#### Reliability

Reliability: Reliability characterises the capability of a device, unit, procedure to perform without fault. Reliability is quantified in terms of probability. This probability is related either to an elementary act or to an interval of time or another continuous variable. Because the probability of failure…

#### Regression Trees

Regression Trees: Regression trees is one of the CART techniques. The main distinction from classification trees (another CART technique) is that the dependent variable is continuous. <!-- See also this introductory text , this book --> Browse Other Glossary Entries

#### Rectangular Filter

Rectangular Filter: The rectangular filter is the simplest linear filter ; it is usually used as a smoother . The output of the rectangular filter at the time moment is the arithmetic mean of the input values corresponding to the moments of time close to…

#### Recursive Filter

Recursive Filter: In recursive filters , the output at the moment is a function of the output values at the previous moments and, probably, of the input values. A major advantage of recursive filters over nonrecursive filters is that they are computationally simpler. For example,…

#### Random Error

Random Error: The random error is the fluctuating part of the overall error that varies from measurement to measurement. Normally, the random error is defined as the deviation of the total error from its mean value. An example of random error is putting the same…

#### Queuing Process

Queuing Process: Queuing process is a class of random process es describing phenomena of queue formation. The term "queue" here is an abstract entity, which reflects the most common features of various types of real-life queues: traffic jams, queues to football matches, queue of e-mail…

#### Proportional Hazard Model (Graphical)

Proportional Hazard Model: Proportional hazard model is a generic term for models (particularly survival models in medicine) that have the form where L is the hazard function or hazard rate, {xi} are covariates, {bi} are coefficients of the model - effects of the corresponding covariates,…

#### Power Spectrum

Power Spectrum: The power spectrum of a stationary random process or a stationary time series is the average of the square of the amplitude of the Fourier spectrum : where is the amplitude spectrum of the realization of the random process ; is…

#### Psychometrics

Psychometrics: Psychometrics or psychological testing is concerned with quantification (measurement) of human characteristics, behavior, performance, health, etc., as well as with design and analysis of studies based on such measurements. An example of the problems being solved in psychometrics is the measurement of intelligence via…

#### Psychological Testing

Psychological Testing: See psychometrics .

Quadratic Mean: The quadratic mean is a special case of the power mean statistics , corresponding to the value of the parameter. The quadratic mean is a synonym of root mean square . The quadratic mean is used as a measure of "effective…

#### Pseudo-Random Numbers

Pseudo-Random Numbers: Pseudo-random numbers are produced by recursive algorithms - i.e. the current number is calculated from one or a greater number of previous numbers. Thus, strictly speaking, the pseudo-random numbers are deterministic, not random. On the other hand, in many respects pseudo-random…

#### Predictor Variable

Predictor Variable: Predictor variable is a synonym for independent variable . See also: dependent and independent variables .

#### Predictive Validity

Predictive Validity: The predictive validity of survey instruments and psychometric tests is a measure of agreement between results obtained by the evaluated instrument and results obtained from more direct and objective measurements. The predictive validity is often quantified by the correlation coefficient between the two…

#### Monte Carlo Simulation

Monte Carlo Simulation: Monte Carlo simulation is simulation of a random phenomena using pseudo-random numbers . This type of simulation is widely used in practical statistics, e.g. in resampling , in queuing theory . The goal of Monte Carlo simulation is not necessarily simulation of…

#### Measurement Error

Measurement Error: The measurement error is the deviation of the outcome of a measurement from the true value. For example, if electronic scales are loaded with a 1 kilogram standard weight and the reading is 1002 grams, the measurement error is +2 gram…

#### Median Filter

Median Filter: The median filter is a robust filter . Median filters are widely used as smoothers for image processing , as well as in signal processing and time series processing. A major advantage of the median filter over linear filters is that…

#### Mean Values (Comparison)

Mean Values (Comparison): The numerical example below illustrates basic properties of various descriptive statistics with "mean" in their name, like the arithmetic mean , the trimmed mean , the geometric mean , the harmonic mean , and several power mean . Because the…

#### Markov Property (Graphical)

Markov Property: Markov property means "absence of memory" of a random process - that is, independence of conditional probabilities on values U(t2 < t). In simpler words, this property means that future behavior depends only on the current state, but not on the…

#### Markov Chain (Graphical)

Markov Chain: A Markov chain is a series of random values x1, x2, ... in which the probabilities associated with a particular value xi depend only on the prior value . For this reason, a Markov chain is a special case of "memoryless"…

#### Linear Model (Graphical)

Linear Model: A linear model specifies a linear relationship between a dependent variable and n independent variables: where y is the dependent variable, {xi} are independent variables, {ai} are parameters of the model. For example, consider that for a sample of 25 cities, the following…

#### Logistic Regression (Graphical)

Logistic Regression: Logistic regression is used with binary data when you want to model the probability that a specified outcome will occur. Specifically, it is aimed at estimating parameters a and b in the following model: where pi is the probability of a success for…

#### Moving Average (MA) Models

Moving Average (MA) Models: Moving average (MA) models are used in time series analysis to describe stationary time series . The MA-models represent time series that are generated by passing the white noise through a non-recursive linear filter . A moving average model of a…

Linkage Function: A linkage function is an essential prerequisite for hierarchical cluster analysis . Its value is a measure of the "distance" between two groups of objects (i.e. between two clusters). Algorithms for hierarchical clustering normally differ by the linkage function used. The most common…

#### Linear Filter

Linear Filter: A linear filter is the filter whose output is a linear function of the input. Any output value of a linear filter is the weighted mean of input values. In other words, to form one element of the output at time…

#### Likelihood Ratio Test (Graphical)

Likelihood Ratio Test: The likelihood ratio test is aimed at testing a simple null hypothesis against a simple alternative hypothesis. (See Hypothesis for an explanation of "simple hypothesis"). The likelihood ratio test is based on the likelihood ratio r as the test statistic: where X…

#### k-Nearest Neighbors Prediction

k-Nearest Neighbors Prediction: The k-nearest neighbors (k-NN) prediction is a method to predict a value of a target variable in a given record, using as a reference point a training set of similar objects. The basic idea is to choose k objects from the training…

#### k-Nearest Neighbors Classification

k-Nearest Neighbors Classification: The k-nearest neighbors (k-NN) classification is a method of classification that uses a training set chosen from the data as a point of reference in classifying observations. The idea of the method is to find the k elements of the training set…

#### k-Means Clustering

k-Means Clustering: The k-means clustering method is used in non-hierarchical cluster analysis . The goal is to divide the whole set of objects into a predefined number (k) of clusters. The criteria for such subdivision is normally the minimal dispersion inside clusters - e.g. the…

#### Kalman Filter (Equations)

Kalman Filter (Equations): The basic mathematics behind the idea of Kalman filter may be described as follows - Consider, for example, a Markov chain - i.e. a random series with Markov property - described by the following equation: (1) where - is the…

#### Kalman Filter

Kalman Filter: Kalman filter is a class of linear filters for predicting and/or smoothing time series. The value of the time series is usually a vector in a state space . Kalman filter is optimal for filtering many types of markov chains .…

#### Interobserver Reliability

Interobserver Reliability: The interobserver reliability of a survey instrument, like a psychological test, measures agreement between two or more subjects rating the same object, phenomenon, or concept. For example, 5 critics are asked to evaluate the quality of 10 different works of art…

#### Intraobserver Reliability

Intraobserver Reliability: Intraobserver reliability indicates how stable are responses obtained from the same respondent at different time points. The greater the difference between the responses, the smaller the intraobserver reliability of the survey instrument. The correlation coefficient between the responses obtained at different…

#### Independent Variable

Independent Variable: See dependent and independent variables .

#### Internal Consistency Reliability

Internal Consistency Reliability: The internal consistency reliability of survey instruments (e.g. psychological tests), is a measure of reliability of different survey items intended to measure the same characteristic. For example, there are 5 different questions (items) related to anxiety level. Each question implies…

#### Hold-Out Sample

A hold-out sample is a random sample from a data set that is withheld and not used in the model fitting process. After the model is fit to the main data (the "training" data), it is then applied to the hold-out sample. This gives an…

#### Image Processing

Image Processing: In image processing, the initial data are images - functions of two coordinates. Normally, images are represented in discrete form as two-dimensional arrays of image elements, or "pixels" - i.e. sets of non-negative values , ordered by two indexes - (rows)…

#### Hierarchical Cluster Analysis

Hierarchical Cluster Analysis: Hierarchical cluster analysis (or hierarchical clustering) is a general approach to cluster analysis , in which the object is to group together objects or records that are "close" to one another. A key component of the analysis is repeated calculation of distance…

#### Harmonic Mean

Harmonic Mean: Harmonic mean is a measure of central location. The harmonic mean of positive values is defined by the formula Let the path between two cities and be divided into parts of equal length. One drives the th part at velocity .…

#### Geometric Distribution (Graphical)

Geometric Distribution: A random variable x obeys the geometric distribution with parameter p (0<p<1) if If a random variable obeys the Bernoulli distribution with probability of success p, then x might be the number of trials before the first "success" occurs. Browse Other Glossary Entries

#### Gini coefficient (Graphical)

Gini coefficient: The Gini coefficient is used in economics to measure income inequality. Generally speaking, it is used to measure the extent of departure from a perfectly even distribution of income. A "0" indicates no departure, i.e. everyone has the same income. A "1" indicates…

#### Gaussian Filter

Gaussian Filter: The Gaussian filter is a linear filter that is usually used as a smoother . The output of the gaussian filter at the moment is the weighted mean of the input values, and the weights are defined by formula where is the "distance"…

#### General Linear Model for a Latin Square (Graphical)

General Linear Model for a Latin Square: In design of experiment, a Latin square is a three-factor experiment in which for each pair of factors in any combination of factor values occurs only once. Consider the following Latin Square, where rows correspond to 4 values…

#### Gamma Distribution (Graphical)

Gamma Distribution: A random variable x is said to have a gamma-distribution with parameters a > 0 and l > 0 if its probability density p(x) is p(x) = ÃƒÂ¬ÃƒÂ¯ÃƒÂ­ ÃƒÂ¯ÃƒÂ®  la G(a) xa-1 e-lx, x > 0; 0, Browse Other Glossary Entries

#### Functional Data Analysis (FDA)

Functional Data Analysis (FDA): In functional data analysis (FDA), data are considered as continuous functions (or curves). This is in contrast to multivariate statistics, where data are considered as vectors (finite sets of values). Real data are usually collected as discrete samples. In FDA, such…

#### Farthest Neighbor Clustering

Farthest Neighbor Clustering: The farthest neighbor clustering is a synonym for complete linkage clustering .

#### Fourier Spectrum

Fourier Spectrum: Any continuous function defined on a finite interval of length can be represented as a weighted sum of cosine functions with periods : where is the frequency of the i-th Fourier component; is the amplitude of the i-th component; is the phase of…

#### Fleming Procedure

Fleming Procedure: Fleming procedure (or O´Brien-Fleming multiple testing procedure ) is a simple multiple testing procedure for comparing two treatments when the response to treatment is dichotomous . This procedure is used in clinical trials. The procedure provides an opportunity to terminate the trial early…

#### Fixed Effects (Graphical)

Fixed Effects: The term "fixed effects" (as contrasted with "random effects") is related to how particular coefficients in a model are treated - as fixed or random values. Which approach to choose depends on both the nature of the data and the objective of the…

#### F Distribution (Graphical)

F Distribution: The F distribution is a family of distributions differentiated by two parameters: m1 (degrees of freedom, numerator) and m2 (degrees of freedom, denominator). If x1 and x2 are independent random variables with a chi-square distribution with m1 and m2 degrees of freedom respectively,…

#### Family-wise Type I Error (Graphical)

Family-wise Type I Error: In multiple comparison procedures, family-wise type I error is the probability that, even if all samples come from the same population, you will wrongly conclude that at least one pair of populations differ. If is the probability of comparison-wise type I…

#### Face Validity

Face Validity: The face validity of survey instruments and tests used in psychometrics , is assessed by cursory review of the items (questions) by untrained individuals. The individuals make their judgments on whether the items are relevant. For example, a researcher developing an IQ-test might…

#### Exponential Distribution (Graphical)

Exponential Distribution: The exponential distribution is a one-sided distribution completely specified by one parameter ; the density of this distribution is The mean of the exponential distribution is . The exponential distribution is a model for the length of intervals between two consecutive random events…

#### Explanatory Variable

Explanatory Variable: Explanatory variable is a synonym for independent variable . See also: dependent and independent variables .

#### Exogenous Variable

Exogenous Variable: Exogenous variables in causal modeling are the variables with no causal links (arrows) leading to them from other variables in the model. In other words, exogenous variables have no explicit causes within the model. The concept of exogenous variable is fundamental in path…

#### Exponential Filter

Exponential Filter: The exponential filter is the simplest linear recursive filter . Exponential filters are widely used in time series analysis , especially for forecasting time series (see the short course Time Series Forecasting ). The exponential filter is described by the following expression: where…

#### Error

Error: Error is a general concept related to deviation of the estimated quantity from its true value: the greater the deviation, the greater the error. Errors are categorised according to their probabilistic nature into systematic errors and random errors , and, according to their relation…

#### Endogenous Variable

Endogenous Variable: Endogenous variables in causal modeling are the variables with causal links (arrows) leading to them from other variables in the model. In other words, endogenous variables have explicit causes within the model. The concept of endogenous variable is fundamental in path analysis and…

#### Econometrics

Econometrics: Econometrics is a discipline concerned with the application of statistics and mathematics to various problems in economics and economic theory. This term literally means "economic measurement". A central task is quantification (measurement) of various qualitative concepts of economic theory - like demand , supply…

#### Divisive Methods (of Cluster Analysis)

Divisive Methods (of Cluster Analysis): In divisive methods of hierarchical cluster analysis , the clusters obtained at the previous step are subdivided into smaller clusters. Such methods start from a single cluster comprising of all N objects, and, after N-1 steps, they end with N…

#### Divergent Validity

Divergent Validity: In psychometrics , the divergent validity of a survey instrument, like an IQ-test, indicates that the results obtained by this instrument do not correlate too strongly with measurements of a similar but distinct trait. For example, if a test is supposed to measure…

#### Dispersion (Measures of)

Dispersion (Measures of): Measures of dispersion express quantitatively the degree of variation or dispersion of values in a population or in a sample . Along with measures of central tendency , measures of dispersion are widely used in practice as descriptive statistics . Some measures…

#### Discrete Distribution

Discrete Distribution: A discrete distribution describes the probabilistic properties of a random variable that takes on a set of values that are discrete, i.e. separate and distinct from one another - a discrete random variable . Discrete values are separated only by a finite number…

#### Design of Experiments

Design of Experiments: Design of experiments is concerned with optimization of the plan of experimental studies. The goal is to improve the quality of the decision that is made from the outcome of the study on the basis of statistical methods, and to ensure that…

#### Dichotomous

Dichotomous: Dichotomous (outcome or variable) means "having only two possible values", e.g. "yes/no", "male/female", "head/tail", "age > 35 / age <= 35" etc. For example, the outcome of an experiment with coin tossing is dichotomous ("head" or "tail"); the variable "biological sex" in a social…

#### Dependent and Independent Variables

Dependent and Independent Variables: Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called independent variables. While analysts typically specify variables in a model to reflect their understanding or theory of "what causes what," setting…

#### Data Mining

Data Mining: Data mining is concerned with finding latent patterns in large data bases. The goal is to discover unsuspected relationships that are of practical importance, e.g., in business. A broad range of statistical and machine learning approaches are used in data mining. See, for…

#### Dendrogram

Dendrogram: The dendrogram is a graphical representation of the results of hierarchical cluster analysis . This is a tree-like plot where each step of hierarchical clustering is represented as a fusion of two branches of the tree into a single one. The branches represent clusters…

#### Cover time

Cover time: Cover time is the expected number of steps in a random walk required to visit all the vertices of a connected graph (a graph in which there is always a path, consisting of one or more edges, between any two vertices). Blom, Holst…

#### Density (of Probability)

Density Functions: A probability density function or curve is a non-negative function ( ) that describes the distribution of a continuous random variable . If is known, then the probability that a value of the variable is within an interval is described by the following…

#### Cross-Validation

Cross-Validation: Cross-validation is a general computer-intensive approach used in estimating the accuracy of statistical models. The idea of cross-validation is to split the data into N subsets, to put one subset aside, to estimate parameters of the model from the remaining N-1 subsets, and to…

#### Clustered Sampling

Clustered Sampling: Clustered sampling is a sampling technique based on dividing the whole population into groups ("clusters"), then using random sampling to select elements from the groups. For example, if the target population is the whole population of a city, a researcher might select 100…