Progress in data science is largely driven by the ever-improving predictive performance of increasingly complex black-box models. However, these predictive gains have come at the expense of losing the ability to interpret the relationships derived between the predictors and target(s) of a model, leading to misapplication and public controversy. These drawbacks reveal that interpretability is actually an ethical issue; data scientists should strive to implement additional interpretability methods that maintain predictive performance (model complexity) while also minimizing its harms.
Any examination of the scholarly or popular literature on “AI” or “data science” makes apparent the profound importance placed upon maximizing predictive performance. After all, recent breakthroughs in model design and the resulting improvements to predictive performance have led to models exceeding doctors’ performance at detecting multiple medical issues and surpassing human reading comprehension. [O1] These breakthroughs have been made possible by transitioning from linear models to black-box models like Deep Neural Networks (DNN) and gradient-boosted trees (e.g xgboost). Instead of using linear transformations of features to generate predictions, these black box models employ complex, nonlinear feature transformations to produce higher fidelity predictions.
Because of the complex mathematics underlying them, these black box models assume the role of oracle, producing predictions without providing human-interpretable explanations for their outputs. While these predictions are often more accurate than linear models, moving away from the built-in interpretability of linear models can pose challenges. For example, the inability to interpret the decision rules of the model can make it harder to gain the trust of users, clients, and regulators, even for models which are otherwise well-designed and effective.
Forgoing model interpretability also presents an ethical dilemma for the sciences. In improving our ability to predict the state of the world, black box models have traded away part of their ability to help us understand the reasoning motivating those predictions. Entire subfields of economics, medicine, and psychology have predicated their existence on successfully translating linear model interpretations into policy prescriptions. For these tasks, predictive performance is often secondary to exploring the relationships created by the model between its predictors and targets. Focusing solely on predictive performance would have neutered our understanding in these fields and may prevent future discoveries that would have otherwise been drawn out of more transparent models.
Outside of public policy and science, forgoing model interpretability has posed more direct challenges. Misapplied black-box models within healthcare, the legal system, and corporate hiring processes have unintentionally harmed both the people and the organizations that they were built to serve. In these cases, the predictions of the black boxes were clearly inaccurate, however, debugging and detecting potential issues prior to deployment was either difficult or impossible given the nature of the model. Such cases have understandably led to public controversy about the ethics of data science as well as calls for stronger regulation around algorithmic data collection, transparency, and fairness.
Balancing complexity and model interpretability is clearly a challenge. Fortunately, there are several interpretability methods that allow data scientists to understand, to an extent, the inner workings of complex black box models which are otherwise unknowable. Applying these methods can allow for maintaining the improved predictive performance of arbitrary black box models while gaining back much of the interpretability lost by moving away from linear models.
Individual interpretability methods can serve a wide variety of functions. For example, global interpretability methods like Partial Dependence Plots (PDPs) can provide diagnostic visualizations for the average impact of features on predictions. The plots depict quantitative relationships between the input and output features of black box models and allow for human interpretations similar to how coefficients from a linear model might be used. Local methods like Shapley values can produce explanations for the impacts of specific feature values on individual predictions, increasing user trust by showing how the model relies on specific features. Model debugging efforts are also made simpler by the increased insight that these methods allow, indicating opportunities for increasing the performance even of black box models that may already perform well.
Ethical data science surely encompasses more than just being able to interpret the inner functioning and outputs of a model. However, the case for why model interpretability should be a part of ethical best practices is compelling. Data scientists integrating interpretability methods into their black box models are improving the ethical due diligence of their work; it is how one can maintain model interpretability while still leveraging the great potential of black box models.