Nonparametric Tests

Nonparametric Tests: In statistical inference procedures (hypothesis tests and confidence intervals), nonparametric procedures are those that are relatively free of assumptions about population parameters. For an example of a nonparametric test, see sign test. See also parametric tests. Browse Other Glossary Entries

Nonrecursive Filter

Statistical Glossary Nonrecursive Filter: In nonrecursive filters , the output at the moment is a function of only input values corresponding to the time moments : A complementary concept is recursive filter . Browse Other Glossary Entries

Nonstationary time series

Nonstationary time series: A time series x_t is called to be nonstationary if its statistical properties depend on time. The opposite concept is stationary time series . Most real world time series are nonstationary. An example of a nonstationary time series is a record of…

Normal Distribution

Normal Distribution: The normal distribution is a probability density which is bell-shaped, symmetrical, and single peaked. The mean, median and mode coincide and lie at the center of the distribution. The two tails extend indefinitely and never touch the x-axis (asymptotic to the x-axis). A…

Normality

Normality: Normality is a property of a random variable that is distributed according to the normal distribution . Normality plays a central role in both theoretical and practical statistics: a great number of theoretical statistical methods rest on the assumption that the data, or test…

Normality Tests

Normality Tests: Normality tests are tests of whether a set of data is distributed in a way that is consistent with a normal distribution. Typically, they are tests of a null hypothesis that the data are drawn from a normal population, specifically a goodness-of-fit test.…

NoSQL

A NoSQL database is distinguished mainly by what it is not - it is not a structured relational database format that links multiple separate tables. NoSQL stands for "not only SQL," meaning that SQL, or structured query language is not needed to extract and organize…

Null Hypothesis

Null Hypothesis: In hypothesis testing, the null hypothesis is the one you are hoping can be disproven by the observed data. Typically, it asserts that chance variation is responsible for an effect seen in observed data (for example, a difference between treatment and placebo, an…

Odds Ratio

Odds Ratio: The odds ratio compares two probabilities (or proportions) P1 and P2 in the following way:   q =  P1/(1-P1) P2/(1-P2) . If P1 and P2 are equal, the odds ratio is equal to 1.   If the symbols do not display properly, try…

Odds Ratio (Graphical)

Odds Ratio: The odds ratio compares two probabilities (or proportions) P1 and P2 in the following way: If P1 and P2 are equal, the odds ratio is equal to 1. Browse Other Glossary Entries

Omega-square

Statistical Glossary Omega-square: Omega-square is a synonym for the coefficient of determination . Browse Other Glossary Entries

One-sided Test

One-sided Test: One-sided test is a synonym for one-tailed test. See 2-Tailed vs. 1-Tailed Tests Browse Other Glossary Entries

Order Statistics

Statistical Glossary Order Statistics: The order statistics of a random sample X1, X2, . . ., Xn are the sample values placed in ascending order. They are denoted by X(1), X(2), . . ., X(n) . Here, X(1) X(2) X(n) . For example, for the…

Ordered categorical data

Statistical Glossary Additive Error: Categorical variables are non-numeric "category" variables, e.g. color. Ordered categorical variables are category variables that have a quantitative dimension that can be ordered but is not on a regular scale. Doctors rate pain on a scale of 1 to 10 -…

Ordinal Scale

Ordinal Scale: An ordinal scale is a measurement scale that assigns values to objects based on their ranking with respect to one another. For example, a doctor might use a scale of 0-10 to indicate degree of improvement in some condition, from 0 (no improvement)…

Ordinary Least Squares Regression

Ordinary Least Squares Regression: Ordinary least squares regression is a special (and the most common) kind of ordinary linear regression . It is based on the least squares method of finding regression parameters. Technically, the aim of ordinary least squares regression is to find out…

Orthogonal Least Squares

Orthogonal Least Squares: In ordinary least squares, we try to minimize the sum of the vertical squared distances between the observed points and the fitted line. In orthogonal least squares, we try to fit a line which minimizes the sum of the squared distances between…

Outcome variable

see dependent and independent variables Browse Other Glossary Entries

Outlier

Outlier: Sometimes a set of data will have one or more items with unusually large or unusually small values. Such extreme values are called outliers. Outliers often arise from some mistakes in data-gathering or data-recording procedures. It is good practice to inspect a data set…

p-value

p-value: The p-value is the probability that the null model could, by random chance variation, produce a sample as extreme as the observed sample (as measured by some sample statistic of interest.) Browse Other Glossary Entries

Paired Replicates Data

Statistical Glossary Paired Replicates Data: Paired replicates is the simplest form of repeated measures data , when only two measurements are made for each experimental unit. Consider, for example, a study of 2 drugs - A and B - to determine whether they reduce arterial…

Panel Data

Panel Data: A panel data set contains observations on a number of units (e.g. subjects, objects) belonging to different clusters (panels) over time. A simple example of panel data is the values of the gross annual income for each of 1000 households in New York…

Panel study

Panel study: A panel study is a longitudinal study that selects a group of subjects then records data for each member of the group at various points in time. See also panel data . Browse Other Glossary Entries

Parallel Design

Parallel Design: In randomized trials, a parallel design is one in which subjects are randomly assigned to treatments, which then proceed in parallel with each group. Conducted properly, they provide assurance that any difference between treatments is in fact due to treatment effects (or random…

Parameter

Parameter: A Parameter is a numerical value that describes one of the characteristics of a probability distribution or population. For example, a binomial distribution is completely specified if the number of trials and probability of success are known. Here, the number of trials and the…

Parametric Tests

Parametric Tests: In statistical inference procedures (hypothesis tests and confidence intervals), parametric procedures are those that incorporate assumptions about population parameters. See also nonparametric tests. Browse Other Glossary Entries

Partial correlation analysis

Partial correlation analysis: Partial correlation analysis is aimed at finding correlation between two variables after removing the effects of other variables. This type of analysis helps spot spurious correlations (i.e. correlations explained by the effect of other variables) as well as to reveal hidden correlations…

Path Analysis

Path Analysis: Path analysis is a method for causal modeling . Consider the simple case of two independent variables x1 and x2 and one dependent variable. Path analysis splits the contribution of x1 and x2 to the variance of the dependent variable y into four…

Path coefficients

Path coefficients: In path analysis and structural equation modeling a path coefficient is the partial correlation coefficient between the dependent variable and an independent variable, adjusted for other independent variables. Browse Other Glossary Entries

Percentile

Percentile: In a population or a sample, the Pth percentile is a value such that at least P percent of the values take on this value or less and at least (100-P) percent of the values take on this value or more. See also: quartile,…

Perceptual Mapping

Statistical Glossary Perceptual Mapping: Perceptual mapping is a synonym for Correspondence analysis . Browse Other Glossary Entries

Permutation Tests

Permutation Tests: A permutation test involves the shuffling of observed data to determine how unusual an observed outcome is. A typical problem involves testing the hypothesis that two or more samples might belong to the same population. The permutation test proceeds as follows: 1. Combine…

Pie Icon Plots

Statistical Glossary Pie Icon Plots: Pie icon plots are a sub-class of icon plots . Each unit or observation is represented by a circle with colored "pies slices" corresponding to variables - the angular size of a slice of pie is proportional to the value…

Pivotal Statistic

Statistical Glossary Pivotal Statistic: A statistic is said to be pivotal if its sampling distribution does not depend on unknown parameters. Pivotal statistics are well suitable for statistical test s - because this property allows you to control type I error , irrespective of any…

Poisson Distribution

Poisson Distribution: The Poisson distribution is a discrete distribution, completely characterized by one parameter l:   p(x=k) =  lk k! e-l,   k=0,1,2,� (where k! = 1 x 2 x ... x k). Both the mean and the variance of Poisson distribution are equal to l.…

Poisson Distribution (Graphical)

Poisson Distribution: Poisson distribution is a discrete distribution, completely characterized by one parameter : (where k! = 1 x 2 x ... x k). Both the mean and the variance of Poisson distribution are equal to . Poisson distribution describes the number of random events…

Poisson Process

Statistical Glossary Poisson Process: A Poisson process is a random function U(t) which describes the number of random events in an interval [0,t] of time or space. The random events have the properties that (i) the probability of an event during a very small interval…

Poisson Process (Graphical)

Statistical Glossary Poisson Process: A Poisson process is a random function U(t) which describes the number of random events in an interval [0,t] of time or space. The random events have the properties that (i) the probability of an event during a very small interval…

Polygon Icon Plots

Statistical Glossary Polygon Icon Plots: Polygon icon plots are a subclass of circular icon plots in which the rays tend to form a polygon. Browse Other Glossary Entries

Polynomial

Statistical Glossary Polynomial: A polynomial of order is a function described by the following expression: where are coefficients of the polynomial. A polynomial of order is the simplest type of polynomial. Its chart, against , is a line. Polynomials are used in regression analysis, multiple…

Population

Population: A population is a large set of objects of a similar nature - e.g. human beings, households, readings from a measurement device - which is of interest as a whole. A related concept is a sample , a subset of objects is drawn from…

Post-hoc tests

Post-hoc tests: Post-hoc tests (or post-hoc comparison tests) are used at the second stage of the analysis of variance (ANOVA) or multiple analysis of variance (MANOVA) if the null hypothesis is rejected. The question of interest at this stage is which groups significantly differ from…

Posterior Probability

Posterior Probability: Posterior probability is a revised probability that takes into account new available information. For example, let there be two urns, urn A having 5 black balls and 10 red balls and urn B having 10 black balls and 5 red balls. Now if…

Power Mean

Statistical Glossary Power Mean: A power mean of order of a set of values is defined by the following expression: The family of power mean statistics is often called the generalized mean - because, for different values of the parameter , it is equivalent to…

Power of a Hypothesis Test

Power of a Hypothesis Test: The power of hypothesis test is a measure of how effective the test is at identifying (say) a difference in populations if such a difference exists. It is the probability of rejecting the null hypothesis when it is false. Browse…

Power Spectrum

Statistical Glossary Power Spectrum: The power spectrum of a stationary random process or a stationary time series is the average of the square of the amplitude of the Fourier spectrum : where is the amplitude spectrum of the realization of the random process ; is…

Precision

Precision: Precision is the degree of accuracy with which a parameter is estimated by an estimator. Precision is usually measured by the standard deviation of the estimator and is known as the standard error. For example, the sample mean is used to estimate the population…

Predicting Filter

Predicting Filter: Predicting filters are filters that estimate the next value in a time series from the known previous values. In contrast to smoothing filters , in predicting filters the output at the moment depends only on the values at preceding moments: . Predicting filters…

Prediction vs. Explanation

Prediction vs. Explanation: With the advent of Big Data and data mining, statistical methods like regression and CART have been repurposed to use as tools in predictive modeling. When statistical models are used as a tool of research, the goal is to explain relationships in…

Predictive Modeling

Predictive modeling is the process of using a statistical or machine learning model to predict the value of a target variable (e.g. default or no-default) on the basis of a series of predictor variables (e.g. income, house value, outstanding debt, etc.). Many of the techniques…

Predictive Validity

Predictive Validity: The predictive validity of survey instruments and psychometric tests is a measure of agreement between results obtained by the evaluated instrument and results obtained from more direct and objective measurements. The predictive validity is often quantified by the correlation coefficient between the two…

predictor

see dependent and independent variables Browse Other Glossary Entries

Predictor Variable

Predictor Variable: Predictor variable is a synonym for independent variable . See also: dependent and independent variables . Browse Other Glossary Entries

Principal Component Analysis

Principal Component Analysis: The purpose of principal component analysis is to derive a small number of linear combinations (principal components) of a set of variables that retain as much of the information in the original variables as possible. Browse Other Glossary Entries

Principal components analysis

Principal components analysis: The purpose of principal component analysis is to derive a small number of linear combinations (principal components) of a set of variables that retain as much of the information in the original variables as possible. This technique is often used when there…

Prior and posterior

Prior and posterior Bayesian statistics typically incorporates new information (e.g. from a diagnostic test, or a recently drawn sample) to answer a question of the form "What is the probability that..." The answer to this question is referred to as the "posterior" probability, arrived at…

Prior and posterior probability (difference)

Prior and posterior probability (difference): Consider a population where the proportion of HIV-infected individuals is 0.01. Then, the prior probability that a randomly chosen subject is HIV-infected is Pprior = 0.01 . Suppose now a subject has been positive for HIV. It is known that…

Prior Probability

Prior Probability: See A Priori Probability . Browse Other Glossary Entries

Probit

Probit: Probit is a nonlinear function of probability p:   probit(p) = F-1(p) where F-1() is the function inverse to the cumulative distribution function F() of the standard normal distribution . In contrast to the probability p itsef (which takes on values from 0 to…

Probit Models

Probit Models: Probit models postulate some relation between the probit of the observed probability, and unknown parameters of the model. The most common example is the model   probit(p) = a + b x which is equivalent to :   p = F(a + b…

Proportional Hazard Model

Proportional Hazard Model: Proportional hazard model is a generic term for models (particularly survival models in medicine) that have the form   L(t | x1, x2, ¼, xn) = h(t) exp(b1 x1 + ¼+ bn xn), where L is the hazard function or hazard rate,…

Proportional Hazard Model (Graphical)

Proportional Hazard Model: Proportional hazard model is a generic term for models (particularly survival models in medicine) that have the form where L is the hazard function or hazard rate, {xi} are covariates, {bi} are coefficients of the model - effects of the corresponding covariates,…

Prospective Versus Retrospective

Prospective vs. Retrospective A prospective study is one that identifies a scientific (usually medical) problem to be studied, specifies a study design protocol (e.g. what you're measuring, who you're measuring, how many subjects, etc.), and then gathers data in the future in accordance with the…

Pruning the tree

<b Pruning the tree: Classification and regression trees, applied to data with known values for an outcome variable, derive models with rules like "If taxable income <$80,000, if no Schedule C income, if standard deduction taken, then no-audit." Pruning is the process of truncating the…

Pseudo-Random Numbers

Statistical Glossary Pseudo-Random Numbers: Pseudo-random numbers are produced by recursive algorithms - i.e. the current number is calculated from one or a greater number of previous numbers. Thus, strictly speaking, the pseudo-random numbers are deterministic, not random. On the other hand, in many respects pseudo-random…

Psychological Testing

Statistical Glossary Psychological Testing: See psychometrics . Browse Other Glossary Entries

Psychometrics

Psychometrics: Psychometrics or psychological testing is concerned with quantification (measurement) of human characteristics, behavior, performance, health, etc., as well as with design and analysis of studies based on such measurements. An example of the problems being solved in psychometrics is the measurement of intelligence via…

Quadratic Mean

Statistical Glossary Quadratic Mean: The quadratic mean is a special case of the power mean statistics , corresponding to the value of the parameter. The quadratic mean is a synonym of root mean square . The quadratic mean is used as a measure of "effective…

Quartile

Quartile: The 1st, 2nd, and 3d quartiles are the 25th, 50th, and 75th percentiles respectively. Browse Other Glossary Entries

Quasi-experiment

Quasi-experiment: In social science research, particularly in the qualitative literature on program evaluation, the term "quasi-experiment" refers to studies that do not involve the application of treatments via random assignment of subjects. They are also called observational studies. A quasi-experiment (or observational study) does involve…

Queuing Process

Queuing Process: Queuing process is a class of random process es describing phenomena of queue formation. The term "queue" here is an abstract entity, which reflects the most common features of various types of real-life queues: traffic jams, queues to football matches, queue of e-mail…

R-squared

R-squared: See Coefficient of determination Browse Other Glossary Entries

Random Effects

Random Effects: The term "random effects" (as contrasted with "fixed effects") is related to how particular coefficients in a model are treated - as fixed or random values. Which approach to choose depends on both the nature of the data and the objective of the…

Random Error

Random Error: The random error is the fluctuating part of the overall error that varies from measurement to measurement. Normally, the random error is defined as the deviation of the total error from its mean value. An example of random error is putting the same…

Random Field

Statistical Glossary Random Field: A random field describes an experiment with outcomes being functions of more than one continuous variable, for example U(x,y,z), where x, y, and z are coordinates in space. Random field is extension of the concept of random process into the case…

Random Numbers

Random Numbers: Random numbers are the numbers produced by a truly random mechanism (in contrast to pseudo-random numbers ). For example, random numbers with a good degree of randomness may be produced by tossing a coin, recording "0" or "1" (instead of "head" or "tail"),…

Random Process

Statistical Glossary Random Process: A random process describes an experiment with outcomes being functions of a single continuous variable (e.g. time). See also Random Series, Random Field. Browse Other Glossary Entries

Random Sampling

Random Sampling: Random sampling is a method of selecting a sample from a population in which all the items in the population have an equal chance of being chosen in the sample. Browse Other Glossary Entries

Random Series

Random Series: A random series describes an experiment with outcomes being functions of an integer argument: U1, U2, ... (or, simply, sequences of random values - 1st value, 2nd value, etc). Random series are often referred to as "time series" and, sometimes, as random processes…

Random Variable

Random Variable: A random variable is a variable that takes different real values as a result of the outcomes of a random event or experiment. To put it differently, it is a real valued function defined over the elements of a sample space. There can…

Random Walk

Statistical Glossary Random Walk: A random walk is a process of random steps, motions, or transitions. It might be in one dimension (movement along a line), in two dimensions (movements in a plane), or in three dimensions or more. There are many different types of…

Randomization Test

Randomization Test: See permutation tests. Browse Other Glossary Entries

Range

Range: Range is a measure of dispersion. It is defined as the difference between the highest and the lowest values. Browse Other Glossary Entries

Rank Correlation Coefficient

Rank Correlation Coefficient: Rank correlation is a method of finding the degree of association between two variables. The calculation for the rank correlation coefficient the same as that for the Pearson correlation coefficient, but is calculated using the ranks of the observations and not their…

Ratio Scale

Ratio Scale: A ratio scale is a measurement scale in which a certain distance along the scale means the same thing no matter where on the scale you are, and where "0" on the scale represents the absence of the thing being measured. Thus a…

Reciprocal Averaging

Reciprocal Averaging: Reciprocal averaging is a widely used algorithm for correspondence analysis . The correspondence analysis itself is sometimes also called reciprocal averaging. The initial data set is a two-way contingency table , representing the frequency of particular combinations of values of two categorical variables…

Rectangular Filter

Rectangular Filter: The rectangular filter is the simplest linear filter ; it is usually used as a smoother . The output of the rectangular filter at the time moment is the arithmetic mean of the input values corresponding to the moments of time close to…

Recursive Filter

Recursive Filter: In recursive filters , the output at the moment is a function of the output values at the previous moments and, probably, of the input values. A major advantage of recursive filters over nonrecursive filters is that they are computationally simpler. For example,…

Regression

Regression: See regression analysis. Browse Other Glossary Entries

Regression Analysis

Regression Analysis: Regression analysis provides a "best-fit" mathematical equation for the relationship between the dependent variable (response) and independent variable(s) (covariates). There are two major classes of regression - parametric and non-parametric. Parametric regression requires choice of the regression equation with one or a greater…

Regression Trees

Regression Trees: Regression trees is one of the CART techniques. The main distinction from classification trees (another CART technique) is that the dependent variable is continuous. See also this introductory text , this book Browse Other Glossary Entries

Regularization

Regression Trees: Regularization refers to a wide variety of techniques used to bring structure to statistical models in the face of data size, complexity and sparseness. Advances in digital processing, storage and retrieval have led to huge and growing data sets ("Big Data"). Regularization is…

Rejection Region

Rejection Region: See Acceptance region Browse Other Glossary Entries

Relative Efficiency (of tests)

Statistical Glossary Relative Efficiency (of tests): The relative efficiency of two tests is a measure of the relative power of two tests. Suppose tests 1 and 2 are tests for the same null-hypothesis and at the same significance level "alpha" (probability of type I error).…

Relative Frequency Distribution

Statistical Glossary Relative Frequency Distribution: A relative frequency distribution is a tabular summary of a set of data showing the relative frequency of items in each of several non-overlapping classes. The relative frequency is the fraction or proportion of the total number of items belonging…