Skip to content

Data Partition

Data Partition:

Data partitioning in data mining is the division of the whole data available into two or three non-overlapping sets: the training set , the validation set , and the test set . If the data set is very large, often only a portion of it is selected for the partitions. Partitioning is normally used when the model for the data at hand is being chosen from a broad set of models. The basic idea of data partitioning is to keep a subset of available data out of analysis, and to use it later for verification of the model.

For example, a researcher developed a method for prediction of time series of stock prices data. The parameters of the model have been fitted to the available data, and the model demonstrates high prediction accuracy on these data. But this does not necessarily mean that the model will predict new data that well — the model has been especially tuned to the characteristics (including random chance aspects) of the data used to fit it. Data partitioning is used to avoid such overly optimistic estimates of the model precision.

Data partitioning is normally used in supervised learning techniques in data mining where a predictive model is chosen from a set of models, using their performance on the training set as the validation of choice. Some examples of such techniques are classification trees , regression trees , neural networks , nonlinear variants of the discriminant analysis .

Browse Other Glossary Entries

Test Yourself

Planning on taking an introductory statistics course, but not sure if you need to start at the beginning? Review the course description for each of our introductory statistics courses and estimate which best matches your level, then take the self test for that course. If you get all or almost all the questions correct, move on and take the next test.

Data Analytics

Considering becoming adata scientist, customer analyst or our data science certificate program?

Analytics Quiz

Advanced Statistics Quiz

Statistics Quiz

Statistics

Looking at statistics for graduate programs or to enhance your foundational knowledge?

Statistics 1 Quiz

Regression Quiz

Regression Quiz

Biostatistics

Entering the biostatistics field? Test your skill here.

Biostatistics Quiz

Advanced Statistics Quiz

Statistics 2 Quiz

Stay Informed

Our Blog

Read up on our latest blogs

Certificates

Learn about our certificate programs

Courses

Find the right course for you

Contact Us

We'd love to answer your questions

Our mentors and academic advisors are standing by to help guide you towards the courses or program that makes the most sense for you and your goals.

300 W Main St STE 301, Charlottesville, VA 22903

(434) 973-7673

ourcourses@statistics.com

By submitting your information, you agree to receive email communications from Statistics.com. All information submitted is subject to our privacy policy. You may opt out of receiving communications at any time.