Time Series

Time Series: Time series data are measurements of a variable taken at regular intervals over time. Time series are represented as sequences of values like x(1), x(2), ... . A wide class of practically important data are represented as time series: economic and social data,…

Time Series Analysis

Time Series Analysis: Time series analysis is a branch of statistics dealing with data represented as time series . Time series analysis includes almost all classes of statistical approaches and problems: data description, hypothesis testing , parameter estimation , regression , etc. The practical importance…

Time-series data

Time-series data: See longitudinal data Browse Other Glossary Entries

Tokenization

Tokenization: In processing unstructured text, tokenization is the step by which the character string in a text segment is turned into units - tokens - for further analysis. Ideally, those tokens would be words, but numbers and other characters can also count as tokens. A…

Training Set

Training Set: A training set is a portion of a data set used to fit (train) a model for prediction or classification of values that are known in the training set, but unknown in other (future) data. The training set is used in conjunction with…

Transformation

Transformation: Transformation is the conversion of a data set into a transformed data set by the application of a function. The statistical purpose of transformation is to produce a transformed data set that better conforms to the requirements of a statistical procedure. A typical use…

Triangular Filter

Statistical Glossary Triangular Filter: The triangular filter is a linear filter that is usually used as a smoother . The output of the rectangular filter at the moment is the weighted mean of the input values at the adjacent moments of discrete time . In…

Trimmed Mean

Statistical Glossary Trimmed Mean: The trimmed mean is a family of measures of central tendency . The -trimmed mean of of values is computed by sorting all the values, discarding % of the smallest and % of the largest values, and computing the of the…

Truncation

Truncation: Truncation, generally speaking, means to shorten. In statistics it can mean the process of limiting consideration or analysis to data that meet certain criteria (for example, the patients still alive at a certain point). Or it can refer to a data distribution where values…

Tukey´s HSD (Honestly Significant Differences) Test

Tukey´s HSD (Honestly Significant Differences) Test: This test is used for testing the significance of unplanned pairwise comparisons. When you do multiple significance tests, the chance of finding a "significant" difference just by chance increases. Tukey´s HSD test is one of several methods of ensuring…

Two-Tailed Test

Two-Tailed Test: A two-tailed test is a hypothesis test in which the null hypothesis is rejected if the observed sample statistic is more extreme than the critical value in either direction (higher than the positive critical value or lower than the negative critical value). A…

Type I Error

Type I Error: In a test of significance, Type I error is the error of rejecting the null hypothesis when it is true -- of saying an effect or event is statistically significant when it is not. The projected probability of committing type I error…

Type II Error

Type II Error: In a test of significance, Type II error is the error of accepting the null hypothesis when it is false -- of failing to declare a real difference as statistically significant. Obviously, the bigger your samples, the more likely your test is…

Uncertainty and Statistics

Statistical Glossary Uncertainty and Statistics: A main goal of statistics is to quantify or measure uncertainty; this branch of statistics is called "inferential statistics." classical statistics measures uncertainty using fundamental concepts and theories of probability and randomness. Modern statistics often applies Monte Carlo simulation as…

Uniform Distribution

Uniform Distribution: The uniform distribution describes probabilistic properties of a continuous random variable that is equally likely to take any value within an interval , and never takes on values outside this interval. The uniform distribution is characterised by two parameters - the lower and…

Univariate

Univariate: Univariate analysis involves a single variable of interest. Browse Other Glossary Entries

Uplift or Persuasion Modeling

Uplift or Persuasion Modeling: A combination of treatment comparisons (e.g. send a sales solicitation, or send nothing) and predictive modeling to determine which cases or subjects respond (e.g. purchase or not) to which treatments. Here are the steps, in conceptual terms, for a typical uplift…

Validation Sample

Validation Sample: The validation sample is the subset of the data available to a data mining routine used as the validation set . Browse Other Glossary Entries

Validation Set

Validation Set: A validation set is a portion of a data set used in data mining to assess the performance of prediction or classification models that have been fit on a separate portion of the same data set (the training set ). Typically both the…

Validity

Validity: Validity characterises the extent to which a measurement procedure is capable of measuring what it is supposed to measure. Normally, the term "validity" is used in situations where measurement is indirect, imprecise and cannot be precise in principle, e.g. in psychological IQ tests purporting…

Variable-Selection Procedures

Variable-Selection Procedures: In regression analysis, variable-selection procedures are aimed at selecting a reduced set of the independent variables - the ones providing the best fit to the model. The criterion for selecting is usually the following F-statistic: F(x1,...,xp; xp+1) =  SSE(x1,...,xp) - SSE(x1,...,xp, xp+1) SSE(x1,...,xp)…

Variable-Selection Procedures (Graphical)

Variable-Selection Procedures: In regression analysis, variable-selection procedures are aimed at selecting a reduced set of the independent variables - the ones providing the best fit to the model. The criterion for selecting is usually the following F-statistic: where n is the total number of data…

Variables (in design of experiments)

Variables (in design of experiments): Many statistical methods rest on a statistical model which states a relationship Y = f(X1,..,XN) between a dependent variable (Y) and independent variable(s) X1,...,XN. In designed experiments, the dependent variable is often named "response", independent variables manipulated by the experimenter…

Variance

Variance: Variance is a measure of dispersion. It is the average squared distance between the mean and each item in the population or in the sample. An advantage of variance (as compared to the related measure of dispersion - the standard deviation) is that the…

Variance/Mean Ratio

Statistical Glossary Variance/Mean Ratio: Variance/mean ratio (VMR) is used to characterize the distribution of events or objects in time or space. If the distribution is random - i.e. can be modeled by the Poisson process or its multidimensional analogues - then, the VMR is about…

Variance/Mean Ratio Test

Statistical Glossary Variance/Mean Ratio Test: The variance/mean ratio (VMR) test is a statistical test used to test the null hypothesis that the variance/mean ratio is 1.0. The VMR test is usually dealt with as a one-sided test because each direction of departure from the null…

Variate

Variate: The term "variate" is often used as synonym for "variable". Some definitions require that variate values be numeric. Sometimes "variate" is used as a synonym for "a value of the given variable for particular element of the sample " - e.g. sex is a…

Vector Autoregressive Models

Vector Autoregressive Models: Vector autoregressive models describe statistical properties of vector time series . Vector autoregressive models generalize the models used in ordinary autoregression . Consider a vector time series : V(1), V(2), ... In general, vector autoregressive models assume the some functional relation between…

Vector time series

Vector time series: Vector time series are a natural generalization of ordinary (scalar) time series . Vector time series are measurements of a vector variable taken at regular intervals over time. They are represented as sequences of vector values like V(1), V(2), ... An simplest…

Ward´s Linkage

Ward´s Linkage: Ward´s linkage is a method for hierarchical cluster analysis . The idea has much in common with analysis of variance (ANOVA). The linkage function specifying the distance between two clusters is computed as the increase in the "error sum of squares" (ESS) after…

Web Analytics

Web Analytics: Statistical or machine learning methods applied to web data such as page views, hits, clicks, and conversions (sales), generally with a view to learning what web presentations are most effective in achieving the organizational goal (usually sales). This goal might be to sell…

Weighted Kappa

Statistical Glossary Weighted Kappa: Weighted kappa is a measure of agreement for Categorical data . It is a generalization of the Kappa statistic to situations in which the categories are not equal in some respect - that is, weighted by an objective or subjective fuction.…

Weighted Mean

Statistical Glossary Weighted Mean: The weighted mean is a measure of central tendency . The weighted mean of a set of values is computed according to the following formula: where are non-negative coefficients, called "weights", that are ascribed to the corresponding values . Only the…

Weighted Mean (Calculation)

Statistical Glossary Weighted Mean (Calculation): To simplify calculation of the weighted mean , weights are often standardized to make their sum equal to the unit value, i.e. by dividing every weight by the total sum of all weights: Then, the weighted mean is computed using…

White Hat Bias

White Hat Bias is bias leading to distortion in, or selective presentation of, data that is considered by investigators or reviewers to be acceptable because it is in the service of righteous goals. The term was coined by Cope and Allison in 2009, and is…

White Noise

White Noise: The white noise is a stationary time series or a stationary random process with zero autocorrelation. In other words, in white noise any pair of values and taken at different moments and of time are not correlated - i.e. the correlation coefficient is…

Wilcoxon – Mann – Whitney U Test

Wilcoxon - Mann - Whitney U Test: The Wilcoxon-Mann-Whitney test uses the ranks of data to test the hypothesis that two samples of sizes m and n might come from the same population. The procedure is as follows: Combine the data from both samples Rank…

Wilcoxon Rank Sums

Wilcoxon Rank Sums: Wilcoxon rank sums are two statistic s T+ and T- computed from paired replicates data . Suppose we have two sets of pairs of measurements (xi,yi), i=1,...,N for each of N experimental units. We compute differences di = yi - xi; i=1,...,N…

Wilcoxon Signed Ranks Test

Wilcoxon Signed Ranks Test: The Wilcoxon signed ranks test is aimed at testing a null hypothesis from paired replicates data - that both treatments are equivalent. This test is based on one of the two Wilcoxon rank sums as the test statistic. The Wilcoxon signed…

Wilks´s Lambda

Wilks´s Lambda: Wilks´s lambda is a general test statistic used in multivariate tests of mean differences among more than two groups. Several other statistics are special cases of Wilks´s lambda. Browse Other Glossary Entries

y-hat

The estimated or predicted values in a regression or other predictive model are termed the y-hat values. "Y" because y is the outcome or dependent variable in the model equation, and a "hat" symbol (circumflex) placed over the variable name is the statistical designation of…

Z score

Z score: An observation's z-score tells you the number of standard deviations it lies away from the population mean (and in which direction). The calculation is as follows: z =  x - m s , where x is the observation itself, m is the mean…

Z score (Graphical)

Z score: An observation´s z-score tells you the number of standard deviations it lies away from the population mean (and in which direction). The calculation is as follows: where x is the observation itself, is the mean of the distribution, is the standard deviation of…