#### Convolution of Distribution Functions

Convolution of Distribution Functions: If F1(·) and F1(·) are distribution functions, then the function F(·) F(x) = ÃƒÂ³ÃƒÂµ F1(x-y) dF2(y) is called the convolution of distribution functions F1 and F2. This is often denoted as F = F1 *F2. The convolution F1 *F2 provides the…

#### Erlang Distribution

Erlang Distribution: The Erlang distribution with parameters (n, m) characterizes the distribution of time intervals until the emergence of n events in a Poisson process with parameter m . The Erlang distribution is a special case of the gamma distribution . Browse Other Glossary Entries

#### Exponential Distribution

Exponential Distribution: The exponential distribution is a one-sided distribution completely specified by one parameter r > 0; the density of this distribution is f(x) = ÃƒÂ¬ÃƒÂ­ ÃƒÂ® re-rx, x Ã‚Â³ 0 0, x < 0 The mean of the exponential distribution is 1/r. The exponential…

#### F Distribution

F Distribution: The F distribution is a family of distributions differentiated by two parameters: m1 (degrees of freedom, numerator) and m2 (degrees of freedom, denominator). If x1 and x2 are independent random variables with a chi-square distribution with m1 and m2 degrees of freedom respectively,…

#### Gamma Distribution

Gamma Distribution: A random variable x is said to have a gamma-distribution with parameters a > 0 and l > 0 if its probability density p(x) is p(x) = ÃƒÂ¬ ÃƒÂ¯ ÃƒÂ­ ÃƒÂ¯ ÃƒÂ®  la G(a) xa-1 e-lx, x > 0; 0, Browse Other Glossary…

#### Gaussian Distribution

Gaussian Distribution: See Normal Distribution. Browse Other Glossary Entries

#### Geometric Distribution

Geometric Distribution: A random variable x obeys the geometric distribution with parameter p (0<p<1) if P{x=k} = p(1-p)k,     k=0,1,2, ... . If a random variable obeys the Bernoulli distribution with probability of success p, then x might be the number of trials before the first…

#### Law Of Large Numbers

Law of Large Numbers: According to the Law of Large Numbers, the probability that the proportion of successes in a sample will differ from the population proportion by less than c ( any positive constant) approaches 1 as the sample size tends to infinity. Browse…

#### Log-Normal Distribution

Log-Normal Distribution: A random variable X has a log-normal distribution if ln(X) is normally distributed. Browse Other Glossary Entries

#### Markov Chain

Statistical Glossary Markov Chain: A Markov chain is a series of random values x1, x2, ... in which the probabilities associated with a particular value xi depend only on the prior value xi-1. For this reason, a Markov chain is a special case of "memoryless"…

#### Markov Property

Statistical Glossary Markov Property: Markov property means "absence of memory" of a random process - that is, independence of conditional probabilities P( U(t1 > t) | U(t) ) on values U(t2 < t). In simpler words, this property means that future behavior depends only on…

#### Markov Random Field

Statistical Glossary Markov Random Field: See Markov Chain, Random Field. Browse Other Glossary Entries

#### Moment Generating Function

Moment Generating Function: The moment generation function is associated with a probability distribution. The moment generating function can be used to generate moments. However, the main use of the moment generating function is not in generating moments but to help in characterizing a distribution. The…

#### Collaborative filtering

Collaborative filtering: Collaborative filtering algorithms are used to predict whether a given individual might like, or purchase, an item. One popular approach is to find a set of individuals (e.g. customers) whose item preferences (ratings) are similar to those of the given individual over a…

#### Normal Distribution

Normal Distribution: The normal distribution is a probability density which is bell-shaped, symmetrical, and single peaked. The mean, median and mode coincide and lie at the center of the distribution. The two tails extend indefinitely and never touch the x-axis (asymptotic to the x-axis). A…

#### Poisson Distribution

Poisson Distribution: Poisson distribution is a discrete distribution, completely characterized by one parameter l: p(x=k) =  lk k! e-l,   k=0,1,2,ï¿½ (where k! = 1 x 2 x ... x k). Both the mean and the variance of Poisson distribution are equal to l. Poisson distribution…

#### Poisson Process

Statistical Glossary Poisson Process: A Poisson process is a random function U(t) which describes the number of random events in an interval [0,t] of time or space. The random events have the properties that (i) the probability of an event during a very small interval…

#### Posterior Probability

Posterior Probability: Posterior probability is a revised probability that takes into account new available information. For example, let there be two urns, urn A having 5 black balls and 10 red balls and urn B having 10 black balls and 5 red balls. Now if…

#### Prior and posterior probability (difference)

Prior and posterior probability (difference): Consider a population where the proportion of HIV-infected individuals is 0.01. Then, the prior probability that a randomly chosen subject is HIV-infected is Pprior = 0.01 . Suppose now a subject has been positive for HIV. It is known that…

#### Random Process

Statistical Glossary Random Process: A random process describes an experiment with outcomes being functions of a single continuous variable (e.g. time). See also Random Series, Random Field. Browse Other Glossary Entries

#### Standard Normal Distribution

Standard Normal Distribution: The standard normal distribution is the normal distribution where the mean is zero and the standard deviation is one. Browse Other Glossary Entries

#### Support Vector Machines

Support Vector Machines: Support vector machines are used in data mining (predictive modeling, to be specific) for classification of records, by learning from training data. Support vector machines use decision surfaces that separate records. They rely on optimization techniques to maximize separate margins between classes,…

#### t-distribution

t-distribution: A continuous distribution, with single peaked probability density symmetrical around the null value and a bell-curve shape. T-distribution is specified completely by one parameter - the number of degrees of freedom. If X and Y are independent random variables, X has the standard normal…

#### 2-Tailed vs. 1-Tailed Tests

2-Tailed vs. 1-Tailed Tests: The purpose of a hypothesis test is to avoid being fooled by chance occurrences into thinking that the effect you are investigating (for example, a difference between treatment and control) is real. If you are investigating, say, the difference between an…

#### Alpha Level

Alpha Level: See Type I Error. Browse Other Glossary Entries

#### Alternative Hypothesis

Alternative Hypothesis: In hypothesis testing, there are two competing hypotheses - the null hypothesis and the alternative hypothesis. The null hypothesis usually reflects the status quo (for example, the proposed new treatment is ineffective and the observed results are just due to chance variation). The…

#### Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA): A statistical technique which helps in making inference whether three or more samples might come from populations having the same mean; specifically, whether the differences among the samples might be caused by chance variation. Browse Other Glossary Entries

#### ANOVA

ANOVA: See Analysis of variance Browse Other Glossary Entries

Bonferroni Adjustment: Bonferroni adjustment is used in multiple comparison procedures to calculate an adjusted probability a of comparison-wise type I error from the desired probability aFW0 of family-wise type I error. The calculation guarantees that the use of the adjusted a in pairwise comparisons keeps…

#### Box´s M

Box´s M: Box´s M is a statistic which tests the homoscedasticity assumption in MANOVA - that is the assumption that all covariances are the same for any category. The results should be interpreted with caution because Box´s M is not robust - it is very…

#### Cochran´s Q Statistic

Cochran´s Q Statistic: Cochran´s Q statistic is computed from replicated measurements data with binary responses. This statistic tests a difference in effects among 2 or more treatments applied to the same set of experimental units. Consider the results of a study of M treatments applied…

#### Cochran-Mantel-Haenszel (CMH) test

Cochran-Mantel-Haenszel (CMH) test: The Cochran-Mantel-Haenszel (CMH) test compares two groups on a binary response, adjusting for control variables. The initial data are represented as a series of K 2x2 contingency table s, where K is the number of strata. Traditionally, in each table the rows…

#### Cohen´s Kappa

Cohen´s Kappa: Cohen´s kappa is a measure of agreement for Categorical data . It is a special case of the Kappa statistic corresponding to the case of only 2 raters. Historically, this statistic was invented first. Later it was generalized to the case of an…

#### Comparison-wise Type I Error

Comparison-wise Type I Error: In multiple comparison procedures, the comparison-wise type I error is the probability that, even if the samples come from the same population, you will wrongly conclude that they differ. See also Family-wise type I error. Browse Other Glossary Entries

#### Composite Hypothesis

Composite Hypothesis: A statistical hypothesis which does not completely specify the distribution of a random variable is referred to as a composite hypothesis. Browse Other Glossary Entries

#### Correlation Statistic

Correlation Statistic: The correlation statistic is one of the statistics used in the generalized Cochran-Mantel-Haenszel tests . It is applicable when both the treatment (rows) and response (columns) are measured on an ordinal scale . In case of independence between the two variables in all…

#### Critical Region

Critical Region: See Acceptance region Browse Other Glossary Entries

#### Dunn Test

Dunn Test: The Dunn test is a method for multiple comparison s, which generalizes the Bonferroni adjustment procedure. This test is used as a post-hoc test in analysis of variance when the number of comparisons is not large, when compared to the number of all…

#### Exact Tests

Exact Tests: Exact tests are hypothesis tests that are guaranteed to produce Type-I error at or below the nominal alpha level of the test when conducted on samples drawn from a null model. For example, a test conducted at the 5% level of significance that…

#### Family-wise Type I Error

Family-wise Type I Error: In multiple comparison procedures, family-wise type I error is the probability that, even if all samples come from the same population, you will wrongly conclude that at least one pair of populations differ. If a is the probability of comparison-wise type…

#### Fisher´s Exact Test

Fisher´s Exact Test: Fisher´s exact test is the first (historically) permutation test. It is used with two samples of binary data, and tests the null hypothesis that the two samples are drawn from populations with equal but unknown proportions of "successes" (e.g. proportion of patients…

#### General Association Statistic

General Association Statistic: The general association statistic is one of the statistics used in the generalized Cochran-Mantel-Haenszel tests . It is applicable when both the "treatment" and the "response" variables are measured on a nominal scale . If the treatment and response variables are independent…

#### Generalized Cochran-Mantel-Haenszel tests

Generalized Cochran-Mantel-Haenszel tests: The Generalized Cochran-Mantel-Haenszel tests is a family of tests aimed at detecting of association between two categorical variables observed in K strata. The initial data are represented as a series of K RxC contingency table s, where K is the number of…

#### Goodness – of – Fit Test

Goodness - of - Fit Test: It is a statistical test to determine whether there is significant difference between the observed frequency distribution and a theoretical probability distribution which is hypothesized to describe the observed distribution. Browse Other Glossary Entries

#### Hotelling´s T-Square

Hotelling´s T-Square: Hotelling´s T-square is a statistic for a multivariate test of differences between the mean values of two groups. The null hypothesis is that centroid s don´t differ between two groups. Hotelling´s T-square is used in multiple analysis of variance (MANOVA) , and in…

#### Hypothesis

Hypothesis: A (statistical) hypothesis is an assertion or conjecture about the distribution of one or more random variables. For example, an experimenter may pose the hypothesis that the outcomes from treatment A and treatment B belong to the same population or distribution. If the hypothesis…

#### Hypothesis Testing

Hypothesis Testing: Hypothesis testing (also called "significance testing") is a statistical procedure for discriminating between two statistical hypotheses - the null hypothesis (H0) and the alternative hypothesis ( Ha, often denoted as H1). Hypothesis testing, in a formal logic sense, rests on the presumption of…

#### Kappa Statistic

Kappa Statistic: Kappa statistic is a generic term for several similar measures of agreement used with categorical data . Typically it is used in assessing the degree to which two or more raters, examining the same data, agree when it comes to assigning the data…

#### Kolmogorov-Smirnov One-sample Test

Kolmogorov-Smirnov One-sample Test: The Kolmogorov-Smirnov one-sample test is a goodness-of-fit test, and tests whether an observed dataset is consistent with an hypothesized theoretical distribution. The test involves specifying the cumulative frequency distribution which would occur given the theoretical distribution and comparing that with the observed…

#### Kolmogorov-Smirnov Test

Kolmogorov-Smirnov Test: See: Kolmogorov-Smirnov one-sample test and Kolmogorov-Smirnov two-sample test Browse Other Glossary Entries

#### Kolmogorov-Smirnov Two-sample Test

Kolmogorov-Smirnov Two-sample Test: The Kolmogorov-Smirnov two-sample test is a test of the null hypothesis that two independent samples have been drawn from the same population (or from populations with the same distribution). The test uses the maximal difference between cumulative frequency distributions of two samples…

#### Kruskal – Wallis Test

Kruskal - Wallis Test: The Kruskal-Wallis test is a nonparametric test for finding if three or more independent samples come from populations having the same distribution. It is a nonparametric version of ANOVA. Browse Other Glossary Entries

#### Lawley-Hotelling Trace

Lawley-Hotelling Trace: See Hotelling Trace coefficient . Browse Other Glossary Entries

#### Level Of Significance

Level of Significance: In hypothesis testing, you seek to decide whether observed results are consistent with chance variation under the "null hypothesis," or, alternatively, whether they are so different that chance variability can be ruled out as an explanation for the observed sample. The range…

#### Likelihood Ratio Test

Likelihood Ratio Test: The likelihood ratio test is aimed at testing a simple null hypothesis against a simple alternative hypothesis. (See Hypothesis for an explanation of "simple hypothesis"). The likelihood ratio test is based on the likelihood ratio r as the test statistic: r =…

#### Lilliefors Statistic

Statistical Glossary Lilliefors Statistic: The Lilliefors statistic is used in a goodness-of-fit test of whether an observed sample distribution is consistent with normality. The statistic measures the maximum distance between the observed distribution and a normal distribution with the same mean and standard deviation as…

#### Lilliefors test for normality

Statistical Glossary Lilliefors test for normality: The Lilliefors test is a special case of the Kolmogorov-Smirnov goodness-of-fit test. In the Lilliefors test, the Kolmogorov-Smirnov test is implemented using the sample mean and standard deviation as the mean and standard deviation of the theoretical (benchmark) population…

#### Mann – Whitney U Test

Mann - Whitney U Test: See Wilcoxon - Mann - Whitney Test. Browse Other Glossary Entries

#### Mantel-Cox Test

Mantel-Cox Test: The Mantel-Cox test is aimed at testing the null-hypothesis that survival function s don´t differ across groups. Browse Other Glossary Entries

#### Mantel-Haenszel test

Mantel-Haenszel test: See Cochran-Mantel-Haenszel test Browse Other Glossary Entries

#### Mean Score Statistic

Statistical Glossary Mean Score Statistic: The mean score statistic is one of the statistics used in the generalized Cochran-Mantel-Haenszel tests . It is applicable when the response levels (columns) are measured at an ordinal scale . If the two variables are independent of each other…

#### Multiple Comparison

Multiple Comparison: Multiple comparisons are used in the same context as analysis of variance (ANOVA) - to check whether there are differences in population means among more than two populations. In contrast to ANOVA, which simply tests the null hypothesis that all means are equal,…

#### Multiple Testing

Multiple Testing: See Multiple comparison. Browse Other Glossary Entries

#### Nonparametric ANOVA Statistic

Nonparametric ANOVA Statistic: See Mean Score Statistic . Browse Other Glossary Entries

#### Nonparametric Tests

Nonparametric Tests: In statistical inference procedures (hypothesis tests and confidence intervals), nonparametric procedures are those that are relatively free of assumptions about population parameters. For an example of a nonparametric test, see sign test. See also parametric tests. Browse Other Glossary Entries

#### Normality Tests

Normality Tests: Normality tests are tests of whether a set of data is distributed in a way that is consistent with a normal distribution. Typically, they are tests of a null hypothesis that the data are drawn from a normal population, specifically a goodness-of-fit test.…

#### Null Hypothesis

Null Hypothesis: In hypothesis testing, the null hypothesis is the one you are hoping can be disproven by the observed data. Typically, it asserts that chance variation is responsible for an effect seen in observed data (for example, a difference between treatment and placebo, an…

#### Omega-square

Statistical Glossary Omega-square: Omega-square is a synonym for the coefficient of determination . Browse Other Glossary Entries

#### One-sided Test

One-sided Test: One-sided test is a synonym for one-tailed test. See 2-Tailed vs. 1-Tailed Tests Browse Other Glossary Entries

#### p-value

p-value: The p-value is the probability that the null model could, by random chance variation, produce a sample as extreme as the observed sample (as measured by some sample statistic of interest.) Browse Other Glossary Entries

#### Parametric Tests

Parametric Tests: In statistical inference procedures (hypothesis tests and confidence intervals), parametric procedures are those that incorporate assumptions about population parameters. See also nonparametric tests. Browse Other Glossary Entries

#### Permutation Tests

Permutation Tests: A permutation test involves the shuffling of observed data to determine how unusual an observed outcome is. A typical problem involves testing the hypothesis that two or more samples might belong to the same population. The permutation test proceeds as follows: 1. Combine…

#### Pivotal Statistic

Statistical Glossary Pivotal Statistic: A statistic is said to be pivotal if its sampling distribution does not depend on unknown parameters. Pivotal statistics are well suitable for statistical test s - because this property allows you to control type I error , irrespective of any…

#### Post-hoc tests

Post-hoc tests: Post-hoc tests (or post-hoc comparison tests) are used at the second stage of the analysis of variance (ANOVA) or multiple analysis of variance (MANOVA) if the null hypothesis is rejected. The question of interest at this stage is which groups significantly differ from…

#### Power of a Hypothesis Test

Power of a Hypothesis Test: The power of hypothesis test is a measure of how effective the test is at identifying (say) a difference in populations if such a difference exists. It is the probability of rejecting the null hypothesis when it is false. Browse…

#### Randomization Test

Randomization Test: See permutation tests. Browse Other Glossary Entries

#### Rejection Region

Rejection Region: See Acceptance region Browse Other Glossary Entries

#### Relative Efficiency (of tests)

Statistical Glossary Relative Efficiency (of tests): The relative efficiency of two tests is a measure of the relative power of two tests. Suppose tests 1 and 2 are tests for the same null-hypothesis and at the same significance level "alpha" (probability of type I error).…

#### Sensitivity

Sensitivity: Sensitivity (of a medical diagnostic test for a disease) is the probability that the test is positive for a person with the disease. Sensitivity itself is not sufficient to characterize a test. For example, a test reporting all subjects who take the test as…

#### Sign Test

Sign Test: The sign test is a nonparametric test used with paired replicates to test for the difference between the 1st and the 2nd measurement in a group of "subjects". For each pair, you assign a "1" if the 1st measurement has the larger value,…

#### Significance Testing

Significance Testing: See Hypothesis Testing Browse Other Glossary Entries

#### Specificity

Specificity: Specificity (of a medical diagnostic test for a disease) is the probability that the test will come out negative for a person without the disease. Specificity itself is not sufficient to characterize a test. For example, a test reporting all subjects who take the…

#### Statistical Test

Statistical Test: A statistical test is a procedure for statistical hypothesis testing . The outcome of a statistical test is a decision to reject or accept the null hypothesis for given probability of type I error . The outcome is frequently reported as p-value -…

#### t-test

t-test: A t-test is a statistical hypothesis test based on a test statistic whose sampling distribution is a t-distribution. Various t-tests, strictly speaking, are aimed at testing hypotheses about populations with normal probability distribution. However, statistical research has shown that t-tests often provide quite adequate…

#### Tukey´s HSD (Honestly Significant Differences) Test

Tukey´s HSD (Honestly Significant Differences) Test: This test is used for testing the significance of unplanned pairwise comparisons. When you do multiple significance tests, the chance of finding a "significant" difference just by chance increases. Tukey´s HSD test is one of several methods of ensuring…

#### Two-Tailed Test

Two-Tailed Test: A two-tailed test is a hypothesis test in which the null hypothesis is rejected if the observed sample statistic is more extreme than the critical value in either direction (higher than the positive critical value or lower than the negative critical value). A…

#### Type I Error

Type I Error: In a test of significance, Type I error is the error of rejecting the null hypothesis when it is true -- of saying an effect or event is statistically significant when it is not. The projected probability of committing type I error…

#### Type II Error

Type II Error: In a test of significance, Type II error is the error of accepting the null hypothesis when it is false -- of failing to declare a real difference as statistically significant. Obviously, the bigger your samples, the more likely your test is…

#### Variance/Mean Ratio Test

Statistical Glossary Variance/Mean Ratio Test: The variance/mean ratio (VMR) test is a statistical test used to test the null hypothesis that the variance/mean ratio is 1.0. The VMR test is usually dealt with as a one-sided test because each direction of departure from the null…

#### Weighted Kappa

Statistical Glossary Weighted Kappa: Weighted kappa is a measure of agreement for Categorical data . It is a generalization of the Kappa statistic to situations in which the categories are not equal in some respect - that is, weighted by an objective or subjective fuction.…

#### Wilcoxon – Mann – Whitney U Test

Wilcoxon - Mann - Whitney U Test: The Wilcoxon-Mann-Whitney test uses the ranks of data to test the hypothesis that two samples of sizes m and n might come from the same population. The procedure is as follows: Combine the data from both samples Rank…

#### Wilcoxon Rank Sums

Wilcoxon Rank Sums: Wilcoxon rank sums are two statistic s T+ and T- computed from paired replicates data . Suppose we have two sets of pairs of measurements (xi,yi), i=1,...,N for each of N experimental units. We compute differences di = yi - xi; i=1,...,N…

#### Wilcoxon Signed Ranks Test

Wilcoxon Signed Ranks Test: The Wilcoxon signed ranks test is aimed at testing a null hypothesis from paired replicates data - that both treatments are equivalent. This test is based on one of the two Wilcoxon rank sums as the test statistic. The Wilcoxon signed…

#### Wilks´s Lambda

Wilks´s Lambda: Wilks´s lambda is a general test statistic used in multivariate tests of mean differences among more than two groups. Several other statistics are special cases of Wilks´s lambda. Browse Other Glossary Entries

#### Autocorrelation

Autocorrelation: See Serial correlation. Browse Other Glossary Entries

#### Autoregression

Autoregression: Autoregression refers to a special branch of regression analysis aimed at analysis of time series. It rests on autoregressive models - that is, models where the dependent variable is the current value and the independent variables are N previous values of the time series.…

#### Cointegration

Cointegration: Cointegration is a statistical tool for describing the co-movement of data measured over time. The concept of cointegration is widely used in applied time series analysis, especially in econometrics. Two (or a greater number) of nonstationary time series are called to be cointegrated if…

#### Nonstationary time series

Nonstationary time series: A time series x_t is called to be nonstationary if its statistical properties depend on time. The opposite concept is stationary time series . Most real world time series are nonstationary. An example of a nonstationary time series is a record of…

#### Edge

Effect: An edge is a link between two people or entities in a network. Edges can be directed or undirected. A directed edge has a clear origin and destination: lender > borrower, tweeter > follower. An undirected edge connects two people or entities with a…

#### Continuous vs. Discrete Distributions

Control Charts: A discrete distribution is one in which the data can only take on certain values, for example integers. A continuous distribution is one in which data can take on any value within a specified range (which may be infinite). For a discrete distribution,…