Skip to content

Explore Courses | Elder Research | Contact | LMS Login

Statistics.com Logo
  • Courses
    • See All Courses
    • Calendar
    • Intro stats for college credit
    • Faculty
    • Group training
    • Credit & Credentialing
    • Teach With Us
  • Programs/Degrees
    • Certificates
      • Analytics for Data Science
      • Biostatistics
      • Programming For Data Science – Python (Experienced)
      • Programming For Data Science – Python (Novice)
      • Programming For Data Science – R (Experienced)
      • Programming For Data Science – R (Novice)
      • Social Science
    • Undergraduate Degree Programs
    • Graduate Degree Programs
    • Massive Open Online Courses (MOOC)
  • Partnerships
    • Higher Education
    • Enterprise
  • Resources
    • About Us
    • Blog
    • Word Of The Week
    • News and Announcements
    • Newsletter signup
    • Glossary
    • Statistical Symbols
    • FAQs & Knowledge Base
    • Testimonials
    • Test Yourself
Menu
  • Courses
    • See All Courses
    • Calendar
    • Intro stats for college credit
    • Faculty
    • Group training
    • Credit & Credentialing
    • Teach With Us
  • Programs/Degrees
    • Certificates
      • Analytics for Data Science
      • Biostatistics
      • Programming For Data Science – Python (Experienced)
      • Programming For Data Science – Python (Novice)
      • Programming For Data Science – R (Experienced)
      • Programming For Data Science – R (Novice)
      • Social Science
    • Undergraduate Degree Programs
    • Graduate Degree Programs
    • Massive Open Online Courses (MOOC)
  • Partnerships
    • Higher Education
    • Enterprise
  • Resources
    • About Us
    • Blog
    • Word Of The Week
    • News and Announcements
    • Newsletter signup
    • Glossary
    • Statistical Symbols
    • FAQs & Knowledge Base
    • Testimonials
    • Test Yourself
Student Login

Blog

Home Blog Ethical Data Science

Ethical Data Science

Guest Blog – Grant Fleming, Data Scientist, Elder Research

Progress in data science is largely driven by the ever-improving predictive performance of increasingly complex black-box models. However, these predictive gains have come at the expense of losing the ability to interpret the relationships derived between the predictors and target(s) of a model, leading to misapplication and public controversy. These drawbacks reveal that interpretability is actually an ethical issue; data scientists should strive to implement additional interpretability methods that maintain predictive performance (model complexity) while also minimizing its harms.

Any examination of the scholarly or popular literature on “AI” or “data science” makes apparent the profound importance placed upon maximizing predictive performance. After all, recent breakthroughs in model design and the resulting improvements to predictive performance have led to models exceeding doctors’ performance at detecting multiple medical issues and surpassing human reading comprehension. [O1] These breakthroughs have been made possible by transitioning from linear models to black-box models like Deep Neural Networks (DNN) and gradient-boosted trees (e.g xgboost). Instead of using linear transformations of features to generate predictions, these black box models employ complex, nonlinear feature transformations to produce higher fidelity predictions.

Because of the complex mathematics underlying them, these black box models assume the role of oracle, producing predictions without providing human-interpretable explanations for their outputs. While these predictions are often more accurate than linear models, moving away from the built-in interpretability of linear models can pose challenges. For example, the inability to interpret the decision rules of the model can make it harder to gain the trust of users, clients, and regulators, even for models which are otherwise well-designed and effective.

Forgoing model interpretability also presents an ethical dilemma for the sciences. In improving our ability to predict the state of the world, black box models have traded away part of their ability to help us understand the reasoning motivating those predictions. Entire subfields of economics, medicine, and psychology have predicated their existence on successfully translating linear model interpretations into policy prescriptions. For these tasks, predictive performance is often secondary to exploring the relationships created by the model between its predictors and targets. Focusing solely on predictive performance would have neutered our understanding in these fields and may prevent future discoveries that would have otherwise been drawn out of more transparent models.

Outside of public policy and science, forgoing model interpretability has posed more direct challenges. Misapplied black-box models within healthcare, the legal system, and corporate hiring processes have unintentionally harmed both the people and the organizations that they were built to serve. In these cases, the predictions of the black boxes were clearly inaccurate, however, debugging and detecting potential issues prior to deployment was either difficult or impossible given the nature of the model. Such cases have understandably led to public controversy about the ethics of data science as well as calls for stronger regulation around algorithmic data collection, transparency, and fairness.

Balancing complexity and model interpretability is clearly a challenge. Fortunately, there are several interpretability methods that allow data scientists to understand, to an extent, the inner workings of complex black box models which are otherwise unknowable. Applying these methods can allow for maintaining the improved predictive performance of arbitrary black box models while gaining back much of the interpretability lost by moving away from linear models.

Individual interpretability methods can serve a wide variety of functions. For example, global interpretability methods like Partial Dependence Plots (PDPs) can provide diagnostic visualizations for the average impact of features on predictions. The plots depict quantitative relationships between the input and output features of black box models and allow for human interpretations similar to how coefficients from a linear model might be used. Local methods like Shapley values can produce explanations for the impacts of specific feature values on individual predictions, increasing user trust by showing how the model relies on specific features. Model debugging efforts are also made simpler by the increased insight that these methods allow, indicating opportunities for increasing the performance even of black box models that may already perform well.

Ethical data science surely encompasses more than just being able to interpret the inner functioning and outputs of a model. However, the case for why model interpretability should be a part of ethical best practices is compelling. Data scientists integrating interpretability methods into their black box models are improving the ethical due diligence of their work; it is how one can maintain model interpretability while still leveraging the great potential of black box models.

Recent Posts

  • Oct 6: Ethical AI: Darth Vader and the Cowardly Lion
    /
    0 Comments
  • Oct 19: Data Literacy – The Chainsaw Case
    /
    0 Comments
  • Data Literacy – The Chainsaw Case
    /
    0 Comments

About Statistics.com

Statistics.com offers academic and professional education in statistics, analytics, and data science at beginner, intermediate, and advanced levels of instruction. Statistics.com is a part of Elder Research, a data science consultancy with 25 years of experience in data analytics.

 The Institute for Statistics Education is certified to operate by the State Council of Higher Education for Virginia (SCHEV)

Our Links

  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team
  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team

Social Networks

Facebook Twitter Youtube Linkedin

Contact

The Institute for Statistics Education
2107 Wilson Blvd
Suite 850 
Arlington, VA 22201
(571) 281-8817

ourcourses@statistics.com

  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team

© Copyright 2023 - Statistics.com, LLC | All Rights Reserved | Privacy Policy | Terms of Use

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

Accept