Chi-square test (or -test) is a statistical test for testing the null hypothesis that the distribution of a discrete random variable coincides with a given distribution. It is one of the most popular goodness-of-fit tests .
For example, in a supermarket, relative frequencies of purchasing 4 brands of tee have been 0.1, 0.4, 0.2, and 0.3 during the last year; during the last week the number of packets sold have been 31, 41, 22, 18 for the 4 brands, respectively. Has the preference changed - i.e. probabilities of purchasing now differs from the last year average preferences, or the deviations in the observed relative frequencies is caused by chance alone?
The chi-square test, besides discrete variables, is often applied to problems involving continuous random variables . In this case, the values of a continuous variable are transformed to a discrete variable with a finite number of values - e.g. the whole range of possible values is split into a finite number of intervals, and every such interval is considered as a discrete value (e.g. age groups "20...29", "30...39", etc). Then the chi-square test is applied to the new discrete variable.
For small samples, the classical chi-square test is not very accurate - because the sampling distribution of the statistic of the test differs from the chi-square distribution . In such cases, Monte Carlo simulation is a more reasonable approach. In many cases such simulation can be carried out by creating an artificial sample with the given proportion of values and applying a resampling procedure to this sample. Besides the one-sample chi-square test, there are variants of the test for comparison of the distribution of two or several samples. For these variants, a permutation version of the test is more accurate when at least one sample is small. See more on the use of resampling and permutation in short online courses
and in the online book Resampling: The New Statistics
The chi-square test is based on the chi-square statistic .
Want to learn more about this topic?
Statistics.com offers over 100 courses in statistics from introductory to advanced level. Most are 4 weeks long and take place online in series of weekly lessons and assignments, requiring about 15 hours/week. Participate at your convenience; there are no set times when you must to be online. Ask questions and exchange comments with the instructor and other students on a private discussion board throughout the course.
This course provides an easy introduction to statistics and statistical terminology through a series of practical applications. Once you've completed this course you'll be able to summarize data and interpret reports and newspaper accounts that use statistics and probability. You'll use simulation and resampling to fully grasp the difficult concept of "statistical significance."
This course, the first of a 3-course sequence, provides an introduction to statistics for those with little or no prior exposure to basic probability and statistics. It runs every eight weeks.
This course covers the principal statistical concepts used in medical and health sciences. Basic concepts common to all statistical analysis are reviewed, and those concepts with specific importance in medicine and health are covered in detail.
This course covers the analysis of data gathered in surveys.
This course will cover the analysis of contingency table data (tabular data in which the cell entries represent counts of subjects or items falling into certain categories). Topics include tests for independence (comparing proportions as well as chi-square), exact methods, and treatment of ordered data. Both 2-way and 3-way tables are covered.