sets: the training set (used to fit the model), the validation set (used to assess how well the model does), and the test set (used at the completion of the model fitting and assessment process, to predict how well the model will perform with new data). Note that the value of the target variable is known for all three partitions (e.g. buy/don’t buy, churn/don’t churn). For new data, it is not.
Week #1 – Data Partitioning
- December 27, 2012
- , 3:06 pm
In predictive modeling, data partitioning is the division of the data available for analysis into two or three non-overlapping