# Statistical Word of the Week

Azure is the Microsoft Cloud Computing Platform and Services. ML stands for Machine Learning, and is one of the services. Like other cloud computing services, you purchase it on a metered basis - as of 2015, there was a per-prediction charge, and a compute time charge. As of October 2015, it featured a couple dozen algorithms and could be used in studio mode for piloting, or via API for production.

**Promoting better understanding of statistics throughout the world.**

The Institute for Statistics Education offers an extensive glossary of statistical terms, available to all for reference and research. We will provide a statistical term every week, delivered directly to your inbox. To improve your own statistical knowledge, sign up here.

Rather not have more email? Bookmark our "Stats Word of the Week" page.

Categorical variables are non-numeric "category" variables, e.g. color. Ordered categorical variables are category variables that have a quantitative dimension that can be ordered but is not on a regular scale. Doctors rate pain on a scale of 1 to 10 - a "2" has no particular numeric content, nor does a "3," but we can say that 3 represents more pain than 2 (but probably not 50% more, nor, necessarily, the same increment that moving from 9 to 10 represents). Ordered categorical variables can often be successfully used in statistical modeling, even though they may not meet the strict requirements associated with a particular method.

**Promoting better understanding of statistics throughout the world.**

The Institute for Statistics Education offers an extensive glossary of statistical terms, available to all for reference and research. We will provide a statistical term every week, delivered directly to your inbox. To improve your own statistical knowledge, sign up here.

Rather not have more email? Bookmark our "Stats Word of the Week" page.

Bimodal literally means "two modes" and is typically used to describe distributions of values that have two centers. For example, the distribution of heights in a sample of adults might have two peaks, one for women and one for men.

**Promoting better understanding of statistics throughout the world.**

The Institute for Statistics Education offers an extensive glossary of statistical terms, available to all for reference and research. We will provide a statistical term every week, delivered directly to your inbox. To improve your own statistical knowledge, sign up here.

Rather not have more email? Bookmark our "Stats Word of the Week" page.

HDFS is the Hadoop Distributed File System. It is designed to accommodate parallel processing on clusters of commodity hardware, and to be fault tolerant.

**Promoting better understanding of statistics throughout the world.**

Rather not have more email? Bookmark our "Stats Word of the Week" page.

**Promoting better understanding of statistics throughout the world.**

Rather not have more email? Bookmark our "Stats Word of the Week" page.

**Promoting better understanding of statistics throughout the world.**

Rather not have more email? Bookmark our "Stats Word of the Week" page.

**Promoting better understanding of statistics throughout the world.**

Rather not have more email? Bookmark our "Stats Word of the Week" page.

In psychometric surveys, the split-halves method is used to measure the internal consistency reliability of survey instruments, e.g. psychological tests.

The idea is to split the items (questions) related to the same construct to be measured, e.d. the anxiety level, and to compare the results obtained from the two resulting subsets of items. The closer the results - i.e. the scores of the construct being measured (e.g the anxiety level), the greater the internal consistency reliability of this survey instrument.

The correlation coefficient between the two sets of measurements is often used as a quantitative measure of the internal consistency reliability of a survey instrument.

**Promoting better understanding of statistics throughout the world.**

Rather not have more email? Bookmark our "Stats Word of the Week" page.

**Promoting better understanding of statistics throughout the world.**

Rather not have more email? Bookmark our "Stats Word of the Week" page.

**Promoting better understanding of statistics throughout the world.**

Rather not have more email? Bookmark our "Stats Word of the Week" page.

**Promoting better understanding of statistics throughout the world.**

Rather not have more email? Bookmark our "Stats Word of the Week" page.

**Promoting better understanding of statistics throughout the world.**

Rather not have more email? Bookmark our "Stats Word of the Week" page.

_{0}) and the alternative hypothesis ( H

_{a}, often denoted as H

_{1}). Hypothesis testing, in a formal logic sense, rests on the presumption of validity of the null hypothesis - that is, the null hypothesis is rejected only if the data at hand testify strongly enough against it.

In a classic statistical experiment, treatment(s) and placebo are applied to randomly assigned subjects, and, at the end of the experiment, outcomes are compared.

Classification and regression trees, applied to data with known values for an outcome variable, derive models with rules like "If taxable income <$80,000, if no Schedule C income, if standard deduction taken, then no-audit."

In computer science, MapReduce is a procedure that prepares data for parallel processing on multiple computers.

Latent variable models postulate some relationship between the statistical properties of observable variables.

Survival analysis is a set of methods used to model and analyze survival data, also called time-to-event data.

The probability distribution for X is the possible values of X and their associated probabilities. With two separate discrete random variables, X and Y, the joint probability distribution is the function f(x,y)

With a sample of size N, the jackknife involves calculating N values of the estimator, with each value calculated on the basis of the entire sample less one observation.

In the interim monitoring of clinical trials, multiple looks are taken at the accruing patient results - say, response to a medication.

The geometric mean of n values is determined by multiplying all n values together, then taking the nth root of the product. It is useful in taking averages of ratios.

Hierarchical linear modeling is an approach to analysis of hierarchical (nested) data - i.e. data represented by categories, sub-categories, ..., individual units (e.g. school -> classroom -> student).

*O´Brien-Fleming multiple testing procedure*) is a simple multiple testing procedure for comparing two treatments when the response to treatment is dichotomous . This procedure...

Error is the deviation of an estimated quantity from its true value, or, more precisely,

Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called independent variables.

In predictive modeling, ensemble methods refer to the practice of taking multiple models and averaging their predictions.

Exact tests are hypothesis tests that are guaranteed to produce Type-I error at or below the nominal alpha level of the test when conducted on samples drawn from a null model.

Data mining is concerned with finding latent patterns in large databases.

Longitudinal data records multiple observations over time for a set of individuals or units. A typical..

When probabilities are quoted without specification of the sample space, it could result in ambiguity when the sample space is not self-evident.

In regression analysis , collinearity of two variables means that strong correlation exists between them, making it difficult or impossible to estimate their individual regression coefficients reliably.

A cohort study is a longitudinal study that identifies a population or large group (a "cohort") then draws a sample from the population at various points in time and records data for the sample.

The centroid is a measure of center in multi-dimensional space.

*r*successes.

**Washington Post**.

*Ronald F. Abler Distinguished Service Honors*at the upcoming annual meeting next week.

*Teaching Geographic Information Science and Technology in Higher Education, *2012 (Wiley)

The story of the prospective Facebook IPO, and prior IPO's from LinkedIn, Pandora, and Groupon all involve "data scientists". Read an interview with Monica Rogati - Senior Data Scientist at LinkedIn to see the connection.

A neurosurgeon, pathologist and epidemiologist are each told to examine a can of sardines on a table in a closed room, and present a report.

*America Speaks*,

Want to be notified of future courses?

Yes
## MORE COMMENTS...