Glossary of statistical terms
Dependent and Independent Variables: Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called independent variables. While analysts typically specify variables in a model to reflect their understanding or theory of "what causes what," setting up a model in this way, and validating it through various metrics, does not, by itself, confirm causality. The term "(in)dependent" reflects only the functional relationship between variables within a model. Several models based on the same set of variables may differ by how the variables are subdivided into dependent and independent variables. Alternative names for independent variables (especially in data mining and predictive modeling) are input variables, predictors or features. Dependent variables are also called response variables, outcome variables, target variables or output variables. The terms "dependent" and "independent" here have no direct relation to the concept of statistical dependence or independence of events.
For example, a simple linear regression model states a linear relationship between the body weight and body height , and the weight is considered the dependent variable:
where and are parameters of the model.
At the same time, another reasonable model may consider body height as the dependent variable and the weight as the independent variable:
and are parameters of the second model.
In other words, the models explain the value of the dependent variable by values of the independent variables. Therefore, independent variables are often called predictor variables or explanatory variables.
In general, statistical models state some functional relationship between dependent variables and independent variables in the following form:
are dependent variables;
are independent variables;
are functions of the independent variables, usually including random terms simulating statistical uncertainty.
Want to learn more about this topic?
Statistics.com offers over 100 courses in statistics from introductory to advanced level. Most are 4 weeks long and take place online in series of weekly lessons and assignments, requiring about 15 hours/week. Participate at your convenience; there are no set times when you must to be online. Ask questions and exchange comments with the instructor and other students on a private discussion board throughout the course.
In this course you will learn how multiple linear regression models are derived, use software to implement them, learn what assumptions underlie the models, learn how to test whether your data meet those assumptions and what can be done when those assumptions are not met, and develop strategies for building and understanding useful models.
This course covers the two core paradigms that account for most business applications of predictive modeling: classification and prediction. The course includes hands-on work with XLMiner, a data-mining add-in for Excel.