Nonparametric ANOVA Statistic
Nonparametric ANOVA Statistic: See Mean Score Statistic . Browse Other Glossary Entries
Nonparametric ANOVA Statistic: See Mean Score Statistic . Browse Other Glossary Entries
Nonparametric Tests: In statistical inference procedures (hypothesis tests and confidence intervals), nonparametric procedures are those that are relatively free of assumptions about population parameters. For an example of a nonparametric test, see sign test. See also parametric tests. Browse Other Glossary Entries
Statistical Glossary Nonrecursive Filter: In nonrecursive filters , the output at the moment is a function of only input values corresponding to the time moments : A complementary concept is recursive filter . Browse Other Glossary Entries
Nonstationary time series: A time series x_t is called to be nonstationary if its statistical properties depend on time. The opposite concept is stationary time series . Most real world time series are nonstationary. An example of a nonstationary time series is a record of…
Normal Distribution: The normal distribution is a probability density which is bell-shaped, symmetrical, and single peaked. The mean, median and mode coincide and lie at the center of the distribution. The two tails extend indefinitely and never touch the x-axis (asymptotic to the x-axis). A…
Normality: Normality is a property of a random variable that is distributed according to the normal distribution . Normality plays a central role in both theoretical and practical statistics: a great number of theoretical statistical methods rest on the assumption that the data, or test…
Normality Tests: Normality tests are tests of whether a set of data is distributed in a way that is consistent with a normal distribution. Typically, they are tests of a null hypothesis that the data are drawn from a normal population, specifically a goodness-of-fit test.…
A NoSQL database is distinguished mainly by what it is not - it is not a structured relational database format that links multiple separate tables. NoSQL stands for "not only SQL," meaning that SQL, or structured query language is not needed to extract and organize…
Null Hypothesis: In hypothesis testing, the null hypothesis is the one you are hoping can be disproven by the observed data. Typically, it asserts that chance variation is responsible for an effect seen in observed data (for example, a difference between treatment and placebo, an…
Odds Ratio: The odds ratio compares two probabilities (or proportions) P1 and P2 in the following way: q = P1/(1-P1) P2/(1-P2) . If P1 and P2 are equal, the odds ratio is equal to 1. If the symbols do not display properly, try…
Odds Ratio: The odds ratio compares two probabilities (or proportions) P1 and P2 in the following way: If P1 and P2 are equal, the odds ratio is equal to 1. Browse Other Glossary Entries
Statistical Glossary Omega-square: Omega-square is a synonym for the coefficient of determination . Browse Other Glossary Entries
One-sided Test: One-sided test is a synonym for one-tailed test. See 2-Tailed vs. 1-Tailed Tests Browse Other Glossary Entries
Statistical Glossary Order Statistics: The order statistics of a random sample X1, X2, . . ., Xn are the sample values placed in ascending order. They are denoted by X(1), X(2), . . ., X(n) . Here, X(1) X(2) X(n) . For example, for the…
Statistical Glossary Additive Error: Categorical variables are non-numeric "category" variables, e.g. color. Ordered categorical variables are category variables that have a quantitative dimension that can be ordered but is not on a regular scale. Doctors rate pain on a scale of 1 to 10 -…
Ordinal Scale: An ordinal scale is a measurement scale that assigns values to objects based on their ranking with respect to one another. For example, a doctor might use a scale of 0-10 to indicate degree of improvement in some condition, from 0 (no improvement)…
Ordinary Least Squares Regression: Ordinary least squares regression is a special (and the most common) kind of ordinary linear regression . It is based on the least squares method of finding regression parameters. Technically, the aim of ordinary least squares regression is to find out…
Ordinary Linear Regression: See: simple linear regression Browse Other Glossary Entries
Orthogonal Least Squares: In ordinary least squares, we try to minimize the sum of the vertical squared distances between the observed points and the fitted line. In orthogonal least squares, we try to fit a line which minimizes the sum of the squared distances between…
Outlier: Sometimes a set of data will have one or more items with unusually large or unusually small values. Such extreme values are called outliers. Outliers often arise from some mistakes in data-gathering or data-recording procedures. It is good practice to inspect a data set…
p-value: The p-value is the probability that the null model could, by random chance variation, produce a sample as extreme as the observed sample (as measured by some sample statistic of interest.) Browse Other Glossary Entries
Statistical Glossary Paired Replicates Data: Paired replicates is the simplest form of repeated measures data , when only two measurements are made for each experimental unit. Consider, for example, a study of 2 drugs - A and B - to determine whether they reduce arterial…
Panel Data: A panel data set contains observations on a number of units (e.g. subjects, objects) belonging to different clusters (panels) over time. A simple example of panel data is the values of the gross annual income for each of 1000 households in New York…
Panel study: A panel study is a longitudinal study that selects a group of subjects then records data for each member of the group at various points in time. See also panel data . Browse Other Glossary Entries
Parallel Design: In randomized trials, a parallel design is one in which subjects are randomly assigned to treatments, which then proceed in parallel with each group. Conducted properly, they provide assurance that any difference between treatments is in fact due to treatment effects (or random…
Parameter: A Parameter is a numerical value that describes one of the characteristics of a probability distribution or population. For example, a binomial distribution is completely specified if the number of trials and probability of success are known. Here, the number of trials and the…
Parametric Tests: In statistical inference procedures (hypothesis tests and confidence intervals), parametric procedures are those that incorporate assumptions about population parameters. See also nonparametric tests. Browse Other Glossary Entries
Partial correlation analysis: Partial correlation analysis is aimed at finding correlation between two variables after removing the effects of other variables. This type of analysis helps spot spurious correlations (i.e. correlations explained by the effect of other variables) as well as to reveal hidden correlations…
Path Analysis: Path analysis is a method for causal modeling . Consider the simple case of two independent variables x1 and x2 and one dependent variable. Path analysis splits the contribution of x1 and x2 to the variance of the dependent variable y into four…
Path coefficients: In path analysis and structural equation modeling a path coefficient is the partial correlation coefficient between the dependent variable and an independent variable, adjusted for other independent variables. Browse Other Glossary Entries
Pearson correlation coefficient: See correlation coefficient. Browse Other Glossary Entries
Percentile: In a population or a sample, the Pth percentile is a value such that at least P percent of the values take on this value or less and at least (100-P) percent of the values take on this value or more. See also: quartile,…
Statistical Glossary Perceptual Mapping: Perceptual mapping is a synonym for Correspondence analysis . Browse Other Glossary Entries
Permutation Tests: A permutation test involves the shuffling of observed data to determine how unusual an observed outcome is. A typical problem involves testing the hypothesis that two or more samples might belong to the same population. The permutation test proceeds as follows: 1. Combine…
Statistical Glossary Pie Icon Plots: Pie icon plots are a sub-class of icon plots . Each unit or observation is represented by a circle with colored "pies slices" corresponding to variables - the angular size of a slice of pie is proportional to the value…
Statistical Glossary Pivotal Statistic: A statistic is said to be pivotal if its sampling distribution does not depend on unknown parameters. Pivotal statistics are well suitable for statistical test s - because this property allows you to control type I error , irrespective of any…
Poisson Distribution: The Poisson distribution is a discrete distribution, completely characterized by one parameter l: p(x=k) = lk k! e-l, k=0,1,2,� (where k! = 1 x 2 x ... x k). Both the mean and the variance of Poisson distribution are equal to l.…
Poisson Distribution: Poisson distribution is a discrete distribution, completely characterized by one parameter : (where k! = 1 x 2 x ... x k). Both the mean and the variance of Poisson distribution are equal to . Poisson distribution describes the number of random events…
Statistical Glossary Poisson Process: A Poisson process is a random function U(t) which describes the number of random events in an interval [0,t] of time or space. The random events have the properties that (i) the probability of an event during a very small interval…
Statistical Glossary Poisson Process: A Poisson process is a random function U(t) which describes the number of random events in an interval [0,t] of time or space. The random events have the properties that (i) the probability of an event during a very small interval…
Statistical Glossary Polygon Icon Plots: Polygon icon plots are a subclass of circular icon plots in which the rays tend to form a polygon. Browse Other Glossary Entries
Statistical Glossary Polynomial: A polynomial of order is a function described by the following expression: where are coefficients of the polynomial. A polynomial of order is the simplest type of polynomial. Its chart, against , is a line. Polynomials are used in regression analysis, multiple…
Population: A population is a large set of objects of a similar nature - e.g. human beings, households, readings from a measurement device - which is of interest as a whole. A related concept is a sample , a subset of objects is drawn from…
Post-hoc tests: Post-hoc tests (or post-hoc comparison tests) are used at the second stage of the analysis of variance (ANOVA) or multiple analysis of variance (MANOVA) if the null hypothesis is rejected. The question of interest at this stage is which groups significantly differ from…
Posterior Probability: Posterior probability is a revised probability that takes into account new available information. For example, let there be two urns, urn A having 5 black balls and 10 red balls and urn B having 10 black balls and 5 red balls. Now if…
Statistical Glossary Power Mean: A power mean of order of a set of values is defined by the following expression: The family of power mean statistics is often called the generalized mean - because, for different values of the parameter , it is equivalent to…
Power of a Hypothesis Test: The power of hypothesis test is a measure of how effective the test is at identifying (say) a difference in populations if such a difference exists. It is the probability of rejecting the null hypothesis when it is false. Browse…
Statistical Glossary Power Spectrum: The power spectrum of a stationary random process or a stationary time series is the average of the square of the amplitude of the Fourier spectrum : where is the amplitude spectrum of the realization of the random process ; is…
Precision: Precision is the degree of accuracy with which a parameter is estimated by an estimator. Precision is usually measured by the standard deviation of the estimator and is known as the standard error. For example, the sample mean is used to estimate the population…
Predicting Filter: Predicting filters are filters that estimate the next value in a time series from the known previous values. In contrast to smoothing filters , in predicting filters the output at the moment depends only on the values at preceding moments: . Predicting filters…
Prediction vs. Explanation: With the advent of Big Data and data mining, statistical methods like regression and CART have been repurposed to use as tools in predictive modeling. When statistical models are used as a tool of research, the goal is to explain relationships in…
Predictive modeling is the process of using a statistical or machine learning model to predict the value of a target variable (e.g. default or no-default) on the basis of a series of predictor variables (e.g. income, house value, outstanding debt, etc.). Many of the techniques…
Predictive Validity: The predictive validity of survey instruments and psychometric tests is a measure of agreement between results obtained by the evaluated instrument and results obtained from more direct and objective measurements. The predictive validity is often quantified by the correlation coefficient between the two…
Predictor Variable: Predictor variable is a synonym for independent variable . See also: dependent and independent variables . Browse Other Glossary Entries
Principal Component Analysis: The purpose of principal component analysis is to derive a small number of linear combinations (principal components) of a set of variables that retain as much of the information in the original variables as possible. Browse Other Glossary Entries
Principal components analysis: The purpose of principal component analysis is to derive a small number of linear combinations (principal components) of a set of variables that retain as much of the information in the original variables as possible. This technique is often used when there…
Principal Components Analysis of Qualitative Data,: Principal components analysis of qualitative data is a synonym for Correspondence analysis . Browse Other Glossary Entries
Prior and posterior Bayesian statistics typically incorporates new information (e.g. from a diagnostic test, or a recently drawn sample) to answer a question of the form "What is the probability that..." The answer to this question is referred to as the "posterior" probability, arrived at…
Prior and posterior probability (difference): Consider a population where the proportion of HIV-infected individuals is 0.01. Then, the prior probability that a randomly chosen subject is HIV-infected is Pprior = 0.01 . Suppose now a subject has been positive for HIV. It is known that…
Probit: Probit is a nonlinear function of probability p: probit(p) = F-1(p) where F-1() is the function inverse to the cumulative distribution function F() of the standard normal distribution . In contrast to the probability p itsef (which takes on values from 0 to…
Probit Models: Probit models postulate some relation between the probit of the observed probability, and unknown parameters of the model. The most common example is the model probit(p) = a + b x which is equivalent to : p = F(a + b…
Proportional Hazard Model: Proportional hazard model is a generic term for models (particularly survival models in medicine) that have the form L(t | x1, x2, ¼, xn) = h(t) exp(b1 x1 + ¼+ bn xn), where L is the hazard function or hazard rate,…
Proportional Hazard Model: Proportional hazard model is a generic term for models (particularly survival models in medicine) that have the form where L is the hazard function or hazard rate, {xi} are covariates, {bi} are coefficients of the model - effects of the corresponding covariates,…
Prospective vs. Retrospective A prospective study is one that identifies a scientific (usually medical) problem to be studied, specifies a study design protocol (e.g. what you're measuring, who you're measuring, how many subjects, etc.), and then gathers data in the future in accordance with the…
<b Pruning the tree: Classification and regression trees, applied to data with known values for an outcome variable, derive models with rules like "If taxable income <$80,000, if no Schedule C income, if standard deduction taken, then no-audit." Pruning is the process of truncating the…
Statistical Glossary Pseudo-Random Numbers: Pseudo-random numbers are produced by recursive algorithms - i.e. the current number is calculated from one or a greater number of previous numbers. Thus, strictly speaking, the pseudo-random numbers are deterministic, not random. On the other hand, in many respects pseudo-random…
Statistical Glossary Psychological Testing: See psychometrics . Browse Other Glossary Entries
Psychometrics: Psychometrics or psychological testing is concerned with quantification (measurement) of human characteristics, behavior, performance, health, etc., as well as with design and analysis of studies based on such measurements. An example of the problems being solved in psychometrics is the measurement of intelligence via…
Statistical Glossary Quadratic Mean: The quadratic mean is a special case of the power mean statistics , corresponding to the value of the parameter. The quadratic mean is a synonym of root mean square . The quadratic mean is used as a measure of "effective…
Quartile: The 1st, 2nd, and 3d quartiles are the 25th, 50th, and 75th percentiles respectively. Browse Other Glossary Entries
Quasi-experiment: In social science research, particularly in the qualitative literature on program evaluation, the term "quasi-experiment" refers to studies that do not involve the application of treatments via random assignment of subjects. They are also called observational studies. A quasi-experiment (or observational study) does involve…
Queuing Process: Queuing process is a class of random process es describing phenomena of queue formation. The term "queue" here is an abstract entity, which reflects the most common features of various types of real-life queues: traffic jams, queues to football matches, queue of e-mail…
Random Effects: The term "random effects" (as contrasted with "fixed effects") is related to how particular coefficients in a model are treated - as fixed or random values. Which approach to choose depends on both the nature of the data and the objective of the…
Random Error: The random error is the fluctuating part of the overall error that varies from measurement to measurement. Normally, the random error is defined as the deviation of the total error from its mean value. An example of random error is putting the same…
Statistical Glossary Random Field: A random field describes an experiment with outcomes being functions of more than one continuous variable, for example U(x,y,z), where x, y, and z are coordinates in space. Random field is extension of the concept of random process into the case…
Random Numbers: Random numbers are the numbers produced by a truly random mechanism (in contrast to pseudo-random numbers ). For example, random numbers with a good degree of randomness may be produced by tossing a coin, recording "0" or "1" (instead of "head" or "tail"),…
Statistical Glossary Random Process: A random process describes an experiment with outcomes being functions of a single continuous variable (e.g. time). See also Random Series, Random Field. Browse Other Glossary Entries
Random Sampling: Random sampling is a method of selecting a sample from a population in which all the items in the population have an equal chance of being chosen in the sample. Browse Other Glossary Entries
Random Series: A random series describes an experiment with outcomes being functions of an integer argument: U1, U2, ... (or, simply, sequences of random values - 1st value, 2nd value, etc). Random series are often referred to as "time series" and, sometimes, as random processes…
Random Variable: A random variable is a variable that takes different real values as a result of the outcomes of a random event or experiment. To put it differently, it is a real valued function defined over the elements of a sample space. There can…
Statistical Glossary Random Walk: A random walk is a process of random steps, motions, or transitions. It might be in one dimension (movement along a line), in two dimensions (movements in a plane), or in three dimensions or more. There are many different types of…
Range: Range is a measure of dispersion. It is defined as the difference between the highest and the lowest values. Browse Other Glossary Entries
Rank Correlation Coefficient: Rank correlation is a method of finding the degree of association between two variables. The calculation for the rank correlation coefficient the same as that for the Pearson correlation coefficient, but is calculated using the ranks of the observations and not their…
Ratio Scale: A ratio scale is a measurement scale in which a certain distance along the scale means the same thing no matter where on the scale you are, and where "0" on the scale represents the absence of the thing being measured. Thus a…
Reciprocal Averaging: Reciprocal averaging is a widely used algorithm for correspondence analysis . The correspondence analysis itself is sometimes also called reciprocal averaging. The initial data set is a two-way contingency table , representing the frequency of particular combinations of values of two categorical variables…
Rectangular Filter: The rectangular filter is the simplest linear filter ; it is usually used as a smoother . The output of the rectangular filter at the time moment is the arithmetic mean of the input values corresponding to the moments of time close to…
Recursive Filter: In recursive filters , the output at the moment is a function of the output values at the previous moments and, probably, of the input values. A major advantage of recursive filters over nonrecursive filters is that they are computationally simpler. For example,…
Regression Analysis: Regression analysis provides a "best-fit" mathematical equation for the relationship between the dependent variable (response) and independent variable(s) (covariates). There are two major classes of regression - parametric and non-parametric. Parametric regression requires choice of the regression equation with one or a greater…
Regression Trees: Regression trees is one of the CART techniques. The main distinction from classification trees (another CART technique) is that the dependent variable is continuous. See also this introductory text , this book Browse Other Glossary Entries
Regression Trees: Regularization refers to a wide variety of techniques used to bring structure to statistical models in the face of data size, complexity and sparseness. Advances in digital processing, storage and retrieval have led to huge and growing data sets ("Big Data"). Regularization is…
Statistical Glossary Relative Efficiency (of tests): The relative efficiency of two tests is a measure of the relative power of two tests. Suppose tests 1 and 2 are tests for the same null-hypothesis and at the same significance level "alpha" (probability of type I error).…
Statistical Glossary Relative Frequency Distribution: A relative frequency distribution is a tabular summary of a set of data showing the relative frequency of items in each of several non-overlapping classes. The relative frequency is the fraction or proportion of the total number of items belonging…