Relative Frequency Distribution

Statistical Glossary Relative Frequency Distribution: A relative frequency distribution is a tabular summary of a set of data showing the relative frequency of items in each of several non-overlapping classes. The relative frequency is the fraction or proportion of the total number of items belonging…

Reliability

Reliability: Reliability characterises the capability of a device, unit, procedure to perform without fault. Reliability is quantified in terms of probability. This probability is related either to an elementary act or to an interval of time or another continuous variable. Because the probability of failure…

Reliability (in Survey Analysis)

Reliability (in Survey Analysis): In survey analysis, e.g. in psychometrics , reliability is a measure of reproducibility of the survey instrument or test. In other words, reliability is a measure of precision - i.e. it describes the random error of the survey instrument. There are…

Repeatability

Repeatability: Repeatability is the variation of outcomes of an experiment carried out in the same conditions, e.g. by the same operator, in the same laboratory. For example, repeatability of measurements of precise mechanical scales is the variation of weight values reported for a given constant…

Repeated Measures Data

Repeated Measures Data: Repeated measures (or repeated measurements) data are usually obtained from multiple measurements of a response variable. Such multiple measurements are carried out for each experimental unit over time (as in a longitudinal study ) or under multiple conditions. An essential statistical peculiarity…

Replicate

Replicate: A replicate is the outcome of an experiment or observation obtained in course of its replication . In applied statistics, a set of replicates obtained in a series of replications of the experiment or observations is considered as a sample from a much bigger…

Replication

Replication: In statistics, replication is repetition of an experiment or observation in the same or similar conditions. Replication is important because it adds information about the reliability of the conclusions or estimates to be drawn from the data. The statistical methods that assess that reliability…

Reproducibility

Reproducibility: Reproducibility is the variation of outcomes of an experiment carried out in conditions varying within a typical range, e.g. when measurement is carried out by the same device by different operators, in different laboratories, etc. For example, reproducibility of measurements of mechanical scales is…

Resampling

Resampling: See bootstrapping and permutation tests Browse Other Glossary Entries

Residuals

Residuals: Residuals are differences between the observed values and the values predicted by some model. Analysis of residuals allows you to estimate the adequacy of a model for particular data; it is widely used in regression analysis . Browse Other Glossary Entries

Resistance

Statistical Glossary Resistance: Resistance, used with respect to sample estimators, refers to the sensitivity of the estimator to extreme observations. Estimators that do not change much with the addition of deletion of extreme observations are said to be resistant. The median is a resistant estimator…

Response

Response: In design of experiments, response is a dependent variable. Its values are measured for all subjects, and the question of primary interest is how factors affect the response. See Variables (in design of experiments) for an explanatory example. Browse Other Glossary Entries

Response Variable

see dependent and independent variables Browse Other Glossary Entries

RMS

RMS: See Root Mean Square. Browse Other Glossary Entries

RMSE

Statistical Glossary RMSE: RMSE is root mean squared error. In predicting a numerical outcome with a statistical model, predicted values rarely match actual outcomes exactly. The difference between predicted and actual is the error (or residual). To calculate RMSE, square each error, take the average,…

Robust Filter

Statistical Glossary Robust Filter: A robust filter is a filter that is not sensitive to input noise values with extremely large magnitude (e.g. those arising due to anomalous measurement errors. The median filter is an example of a robust filter. Linear filters are not robust…

Robustness

Robustness: Many statistical methods (particularly classical inference methods) rely upon assumptions about the distribution of the population the sample is drawn from. The robustness of a statistical method is its insensitivity to departures from these assumptions. The less sensitive a method is to departures from…

Root Mean Square

Root Mean Square: Root mean square (RMS) of a set of values xi, i=1,...N is the square root of the mean of the squares of the values:   RMS(x1, ... ,xN) =   ÃƒÂ¦ Ö  1 N N ? i=1  xi2 RMS is a statistical measure…

Root Mean Square (Graphical)

Root Mean Square: Root mean square (RMS) of a set of values xi, i=1,...N is the square root of the mean of the squares of the values: RMS is a statistical measure of departure from the null value. Browse Other Glossary Entries

Sample

Sample: A sample is a portion of the elements of a population. A sample is chosen to make inferences about the population by examining or measuring the elements in the sample. Browse Other Glossary Entries

Sample Size Calculations

Sample Size Calculations: Sample size calculations typically arise in significance testing, in the following context: how big a sample size do I need to identify a significant difference of a certain size? The analyst must specify three things: 1) How big a difference is being…

Sample Space

Sample Space: The set of all possible outcomes of a particular experiment is called the sample space for the experiment. If a coin is tossed twice, the sample space is {HH, HT, TH, TT}, where TH, for example, means getting tails on the first toss…

Sample Survey

Sample Survey: In a sample survey , a sample of units drawn from the population of interest is analyzed. A related concept is the census survey . The main advantage of the sample survey (as compared to the census survey ) is that its implementation…

Sampling

Sampling: Sampling is a process of drawing a sample from a population . Sampling may be performed from both real and hypothetical populations. Examples of sampling from a real population are opinion polls (when a finite number of individuals is chosen from a much bigger…

Sampling Distribution

Sampling Distribution: When a sample is drawn, some summary value (called a statistic) is usually computed. For example, the sample mean and the sample variance are two statistics. The value of the statistic changes with the sample we have. The probability distribution of the statistic…

Sampling Frame

Sampling Frame: Sampling frame (synonyms: "sample frame", "survey frame") is the actual set of units from which a sample has been drawn: in the case of a simple random sample, all units from the sampling frame have an equal chance to be drawn and to…

Scale Invariance (of Measures)

Statistical Glossary Scale Invariance (of Measures): Scale invariance is a property of descriptive statistics . If a statistic is scale-invariant, it has the following property for any sample and any non-negative value : (1) or, in mathematically equivalent form In other words, if a statistic…

Scatter Graphs

Scatter Graphs: A scatter graph shows the joint distribution of observed values of two variables. Each pair of values is shown as a point on X-Y plane with coordinates (Xi,Yi), where Xi and Yi are the values of the first and the second variable. Scatter…

Seasonal Adjustment

Seasonal Adjustment: The seasonal adjustment is used in time series analysis to remove a periodic component with the known period from the observed time series. This adjustment is normally performed through the seasonal decomposition of the time series followed by subtraction of the seasonal component…

Seasonal Decomposition

Seasonal Decomposition: The seasonal decomposition is a method used in time series analysis to represent a time series as a sum (or, sometimes, a product) of three components - the linear trend, the periodic (seasonal) component, and random residuals. The seasonal decomposition is useful in…

Seemingly Unrelated Regressions (SUR)

Statistical Glossary Seemingly Unrelated Regressions (SUR): Seemingly unrelated regressions (SUR) is a class of multivariate regression ( multiple regression ) models, normally belonging to the sub-class of linear regression models. A distinctive feature of SUR models is that they consist of several unrelated systems of…

Self-Controlled Design

Statistical Glossary Self-Controlled Design: In randomized trials, a self-controlled design is one in which results are measured in each subject before and after treatment. Both parallel designs and crossover designs can also include a self-controlled feature. Browse Other Glossary Entries

Sensitivity

Sensitivity: Sensitivity (of a medical diagnostic test for a disease) is the probability that the test is positive for a person with the disease. Sensitivity itself is not sufficient to characterize a test. For example, a test reporting all subjects who take the test as…

Sequential Analysis

Sequential Analysis: In sequential analysis, decisions about sample size and the type of data to be collected are made and modified as the study proceeds, incorporating information learned at earlier stages. One major application of sequential analysis is in clinical trials in medicine, where successful…

Sequential Icon Plots

Statistical Glossary Sequential Icon Plots: Sequential icon plots (or column icon plots) are a category of icon plots . For each unit, variables are represented as a sequence of bars with the height reflecting the value of the corresponding variable. Besides the order in the…

Serial Correlation

Serial Correlation: In analysis of time series, the Nth order serial correlation is the correlation between the current value and the Nth previous value of the same time series. For this reason serial correlation is often called "autocorrelation". Browse Other Glossary Entries

Shift Invariance (of Measures)

Shift Invariance (of Measures): Shift invariance is a property of descriptive statistics . If a statistic is shift-invariant, it possesses the following property for any data set : or, in equivalent form In other words, if a statistic is shift-invariant, then addition of an arbitrary…

Sign Test

Sign Test: The sign test is a nonparametric test used with paired replicates to test for the difference between the 1st and the 2nd measurement in a group of "subjects". For each pair, you assign a "1" if the 1st measurement has the larger value,…

Signal

Signal: The signal is the component of the observed data (e.g. of a time series ) that carries useful information. The complementary (opposite) concept is noise . In a narrower sense (e.g. in signal processing ) signals are functions of time, as opposed to fields…

Signal Processing

Signal Processing: Signal processing is a branch of applied statistics concerned with analysis of functions of time that take on scalar or vector values. The functions are normally mixtures of a signal and a noise . A broad range of topics are considered in signal…

Similarity Matrix

Similarity Matrix: Similarity matrix is the opposite concept to the distance matrix . The elements of a similarity matrix measure pairwise similarities of objects - the greater similarity of two objects, the greater the value of the measure. For example, the correlation matrix often may…

Simple Linear Regression

Simple Linear Regression: The simple linear regression is aimed at finding the "best-fit" values of two parameters - A and B in the following regression equation:   Yi = A Xi + B + Ei,     i=1,¼,N where Yi, Xi, and Ei are the values of…

Simple Linear Regression (Graphical)

Simple Linear Regression: The simple linear regression is aimed at finding the "best-fit" values of two parameters - A and B in the following regression equation: where Yi, Xi, and Ei are the values of the dependent variable, of the independent variable, and of the…

Simulation

Simulation: In general, simulation is modelling of a process or phenomenon. In statistics, Monte Carlo simulation is often used to model outcomes of a random experiment. This kind of simulation rests on generation of pseudo-random numbers - that is, numbers which behave like truly random…

Single Linkage Clustering

Single Linkage Clustering: The single linkage clustering method (or the nearest neighbor method) is a method of calculating distance between clusters in hierarchical cluster analysis . The linkage function specifying the distance between two clusters is computed as the minimal object-to-object distance , where objects…

Singularity

Singularity: In regression analysis , singularity is the extreme form of multicollinearity - when a perfect linear relationship exists between variables or, in other terms, when the correlation coefficient is equal to 1.0 or -1.0. Such absolute multicollinearity could arise when independent variable are linearly…

Six-Sigma

Six-Sigma: Six sigma means literally six standard deviations. The phrase refers to the limits drawn on statistical process control charts used to plot statistics from samples taken regularly from a production process. Consider the process mean. A process is deemed to be "in control" at…

Skewness

Skewness: Skewness measures the lack of symmetry of a probability distribution. A curve is said to be skewed to the right (or positively skewed) if it tails off toward the high end of the scale (right tail longer than the left). A curve is skewed…

Smoother (Example)

Smoother (Example): A simple example of a smoother is the moving average procedure. It is based on averaging elements closest in time to the current time. Mathematically this can be expressed by the following simple formula: where is the input of the smoother, the original…

Smoother (Smoothing Filter)

Smoother (Smoothing Filter): Smoothers, or smoothing filters, are algorithms for time-series processing that reduce abrupt changes in the time-series and make it look smoother. Smoothers constitute a broad subclass of filters. Like all filters, smoothers may be subdivided into linear and nonlinear. Linear filters reduce…

Smoothing

Smoothing: Smoothing is a class of time series processing which is intended to reduce noise and to preserve the signal itself. The origin of this term is related to the visual appearance of the time series - it looks smoother after this sort of processing…

Social Network Analytics

Social Network Analytics: Network analytics applied to connections among humans. Recently it has come also to encompass the analysis of web sites and internet services like Facebook. Browse Other Glossary Entries

Social Space Analysis

Social Space Analysis: Social space analysis is a synonym for Correspondence analysis . Browse Other Glossary Entries

Spark

Spark: Spark is a second generation computing environment that sits on top of a Hadoop system, supporting the workflows that leverage a distributed file system. It improves on the performance of the initial Hadoop computational paradigm, MapReduce, via fast functional programming capabilities and the use…

Spatial Field

Spatial Field: A spatial field is a function of spatial variables , or in 3D cases. A spatial field is named a "scalar field" if the function takes on scalar values. For example, the concentration of a toxic substance in the soil at points with…

Specificity

Specificity: Specificity (of a medical diagnostic test for a disease) is the probability that the test will come out negative for a person without the disease. Specificity itself is not sufficient to characterize a test. For example, a test reporting all subjects who take the…

Spectral Analysis

Spectral Analysis: Spectral analysis is concerned with estimation of the spectrum of a stationary random process or a stationary time series from the observed realization(s) of the process (or series). Methods and concepts of spectral analysis play an important role in time series analysis and…

Spectrum

Spectrum: See Fourier spectrum and power spectrum . Browse Other Glossary Entries

Spline

Spline: A spline is a continuous function which coincides with a polynomial on every subinterval of the whole interval on which is defined. In other words, splines are functions which are piecewise polynomial. The coefficients of the polynomial differs from interval to interval, but the…

Split-Halves Method

Statistical Glossary Split-Halves Method: In psychometric surveys, the split-halves method is used to measure the internal consistency reliability of survey instruments, e.g. psychological tests. The idea is to split the items (questions) related to the same construct to be measured, e.d. the anxiety level, and…

SQL

SQL: SQL stands for structured query language, a high level language for querying relational databases, extracting information. For example, SQL provides the syntax rules that can translate a query like this into a form that can be submitted to the database: "Find all sales of…

Standard Deviation

Standard Deviation: The standard deviation is a measure of dispersion. It is the positive square root of the variance. An advantage of the standard deviation (as compared to the variance) is that it expresses dispersion in the same units as the original values in the…

Standard error

Standard error: The standard error measures the variability of an estimator (or sample statistic) from sample to sample. There are two approaches to estimating standard error: 1. The bootstrap. With the bootstrap, you take repeated simulated samples (usually resamples from the observed data, of the…

Standard Normal Distribution

Standard Normal Distribution: The standard normal distribution is the normal distribution where the mean is zero and the standard deviation is one. Browse Other Glossary Entries

Standard Score

Standard Score: The standard score of an observation is the number of standard deviation units it is above or below the mean. The standard score of an observation is calculated by subtracting the mean from the observation, then dividing by the standard deviation. Browse Other…

Standardized Mean Difference

Standardized Mean Difference: The standardized mean difference is the difference between two normalized means - i.e. the mean values divided by an estimate of the within-group standard deviation . The standardized mean difference is used for comparison of data obtained at different scales. Browse Other…

Stanine

Statistical Glossary Stanine: A stanine is a "standard ninth," an interval used in dividing school test results into (more or less) ninths. Stanine 1 includes all scores less than 1.75 standard deviations below the mean, and stanine 9 all scores more than 1.75 standard deviations…

Star Icon Plots

Statistical Glossary Star Icon Plots: Star icon plots are a subclass of circular icon plots in which the rays tend to form a star. Browse Other Glossary Entries

State Space

Statistical Glossary State Space: State space is an abstract space representing possible states of a system. A point in the state space is a vector of the values of all relevant parameters of the system. It is often assumed that the system is dynamic -…

Stationary time series

Stationary time series: A time series x(t); t=1,... is called to be stationary if its statistical properties do not depend on time t . A time series may be stationary in respect to one characteristic, e.g. the mean, but not stationary in respect to another,…

Statistic

Statistic: 1. A number measuring something 2. A measure calculated from a sample of data. Contrast "statistic" (drawn from a sample) with "parameter," which is a characteristic of a population. For example, the sample mean is a statistic; the population mean is a parameter of…

Statistical Significance

Statistical Significance: Outcomes to an experiment or repeated events are statistically significant if they differ from what chance variation might produce. For example - suppose n people are given a medication. If their response to the medication lies outside the range of how samples of…

Statistical Test

Statistical Test: A statistical test is a procedure for statistical hypothesis testing . The outcome of a statistical test is a decision to reject or accept the null hypothesis for given probability of type I error . The outcome is frequently reported as p-value -…

Statistics

Statistical Glossary Statistics: 1. A collection of numerical data that measure something. 2. The science of recording, organizing, analyzing and reporting quantitative information. See also: statistic Browse Other Glossary Entries

Stemming

Stemming: In processing unstructured text, stemming is the process of converting multiple forms of the same word into one stem, to simplify the task of analyzing the processed text. For example, in the previous sentence, "processing," "process," and "processed" would all be converted to the…

Step-wise Regression

Step-wise Regression: Step-wise regression is one of several computer-based iterative variable-selection procedures. Variables are added one-by-one based on their contribution to R-squared, but first, at each step we determine whether any of the variables (already included in the model) can be removed. If none of…

Stochastic Process

Stochastic Process: Stochastic process is a synonym for random process . Browse Other Glossary Entries

Stratified Sampling

Stratified Sampling: Stratified sampling is a method of random sampling. In stratified sampling, the population is first divided into homogeneous groups, also called strata. Then, elements from each stratum are selected at random according to one of the two ways: (i) the number of elements…

Strip transect

Strip transect: A strip transect is a small subsection of a geographically-defined study area, typically chosen randomly. For example, Manly (Introduction to Ecological Sampling, CRC) discusses using randomly selected strips 3 meters wide and 20 meters long which are carefully examined and the number of…

Structural Equation Modeling

Structural Equation Modeling: Structural equation modeling includes a broad range of multivariate analysis methods aimed at finding interrelations among the variables in linear models by examining variances and covariances of the variables. Path analysis , for example, is a method of structural equation modeling. Structural…

Structured vs. unstructured data

Structured vs. unstructured data: Structured data is data that is in a form that can be used to develop statistical or machine learning models (typically a matrix where rows are records and columns are variables or features). Or data that is in a form that…

Sufficient Statistic

Sufficient Statistic: Suppose X is a random vector with probability distribution (or density) P(X | V), where V is a vector of parameters, and Xo is a realization of X. A statistic T(X) is called a sufficient statistic if the conditional probability (density) P(X |…

Sufficient Statistic (Graphical)

Sufficient Statistic: Suppose X is a random vector with probability distribution (or density) P(X | V), where V is a vector of parameters, and Xo is a realization of X. A statistic T(X) is called a sufficient statistic if the conditional probability (density) does not…

Sun Ray Plots

Statistical Glossary Sun Ray Plots: Sun ray plots are a subclass of circular icon plots in which the rays tend to form a circle. Browse Other Glossary Entries

Support Vector Machines

Support Vector Machines: Support vector machines are used in data mining (predictive modeling, to be specific) for classification of records, by learning from training data. Support vector machines use decision surfaces that separate records. They rely on optimization techniques to maximize separate margins between classes,…

Survey

Survey: Statistical surveys are general methods to gather quantitative information about a particular population. "Population" here does not necessarily mean a set of human beings, but may consist of other type of units - firms, households, universities, hospitals, etc. While there are types and forms…

Survival Analysis

Survival Analysis: Survival analysis is concerned with "time-to-event" data. In medical statistics, the data are often in the form of "time-to-death". In the analysis of production or industrial data, "time-to-failure" is a typical application. However, the event of interest need not either be failure or…

Survival Function

Survival Function: In medical statistics, the survival function is a relationship between a proportion and time. The proportion is the proportion of subjects who are still surviving at time "t." The term can also be applied in fields other than medicine, referring to "units still…

Systematic Error

Statistical Glossary Systematic Error: Systematic error is the error that is constant in a series of repetitions of the same experiment or observation. Usually, systematic error is defined as the expected value of the overall error. An example of systematic error is an electronic scale…

Systematic Sampling

Systematic Sampling: Systematic sampling is a method of random sampling. The elements to be sampled are selected at a uniform interval that is measured in time, order, or space. Browse Other Glossary Entries

t-distribution

t-distribution: A continuous distribution, with single peaked probability density symmetrical around the null value and a bell-curve shape. T-distribution is specified completely by one parameter - the number of degrees of freedom. If X and Y are independent random variables, X has the standard normal…

t-distribution (Graphical)

t-distribution: A continuous distribution, with single peaked probability density symmetrical around the null value and a bell-curve shape. T-distribution is specified completely by one parameter - the number of degrees of freedom. If X and Y are independent random variables, X has the standard normal…

t-statistic

t-statistic: T-statistic is a statistic whose sampling distribution is a t-distribution. Often, the term "t-statistic" is used in a narrower sense - as the standardized difference between a sample mean and a population mean m, where N is the sample size: where and are the…

t-statistic (Graphical)

t-statistic: T-statistic is a statistic whose sampling distribution is a t-distribution. Often, the term "t-statistic" is used in a narrower sense - as the standardized difference between a sample mean and a population mean , where N is the sample size: where and are the…

t-test

t-test: A t-test is a statistical hypothesis test based on a test statistic whose sampling distribution is a t-distribution. Various t-tests, strictly speaking, are aimed at testing hypotheses about populations with normal probability distribution. However, statistical research has shown that t-tests often provide quite adequate…

Target Variable

See dependent and independent variables Browse Other Glossary Entries

Test Set

Test Set: A test set is a portion of a data set used in data mining to assess the likely future performance of a single prediction or classification model that has been selected from among competing models, based on its performance with the validation set.…

Test-Retest Reliability

Test-Retest Reliability: The test-retest reliability of a survey instrument, like a psychological test, is estimated by performing the same survey with the same respondents at different moments of time. The closer the results, the greater the test-retest reliability of the survey instrument. The correlation coefficient…

The Tukey Mean-Difference Plot

Statistical Glossary The Tukey Mean-Difference Plot: The Tukey mean-difference plot is a scatter graph produced not for (x,y) values themselves, but for modified coordinates (X,Y) X =  (x+y) 2 , Y = y-x. Such a plot is useful, for example, to analyze data with strong…