The training data are a subset of all the data that you have available, and are used to fit various models. The models are then applied to another subset(s) of the same data and predicted values of the outcome variable are calculated. The predicted values are then compared to the actual values, and measures of model performance are calculated and the models are compared.
Week # 29 – Training data
Also called the training sample, training set, calibration sample. The context is predictive modeling (also called supervised data mining) - where you have data with multiple predictor variables and a single known outcome or target variable.