Predictive modeling is the process of using a statistical or machine learning model to predict the value of a target variable (e.g. default or no-default) on the basis of a series of predictor variables (e.g. income, house value, outstanding debt, etc.). Many of the techniques used (e.g. regression, logistic regression, discriminant analysis) have been used for nearly a century in statistical research. However, in predictive modeling the emphasis is on predicting values in new data, rather than trying to explain an existing data set. Prediction can work and be quite effective even if the relationships between predictor variables and the target variable are not understood. Hence, traditional metrics that measure how well a model fits the data that it was fit to (e.g. R-squared or goodness-of-fit) are not that important in predictive modeling. What is important is how well the model predicts, and this is typically measured by applying the model to a hold-out sample where the value of the target variable is known.
This course introduces to the basic concepts in predictive analytics to visualize and explore data to understand the two core paradigms that account for most business applications of predictive modeling: classification and prediction.