The technique is aimed at producing rules that predict the value of an outcome (target) variable from known values of predictor (explanatory) variables. The predictor variables may be a mixture of categorical and continuous variables.
Consider the goal of predicting loan default on the basis of variables like income, debt, credit score, etc. The first step is to split the data into two groups, according to whether a certain predictor variable (say, income) is above or below a certain value. The variable and the split value are chosen to maximize homogeneity of the target variable within each of the groups. This process is then repeated over and over, yielding a set of rules like “If low income, if high debt, if credit score below 500, then classify as likely default.”