Skip to content

Explore Courses | Elder Research | Contact | LMS Login

Statistics.com Logo
  • Courses
    • See All Courses
    • Calendar
    • Intro stats for college credit
    • Faculty
    • Group training
    • Credit & Credentialing
    • Teach With Us
  • Programs/Degrees
    • Certificates
      • Analytics for Data Science
      • Biostatistics
      • Programming For Data Science – Python (Experienced)
      • Programming For Data Science – Python (Novice)
      • Programming For Data Science – R (Experienced)
      • Programming For Data Science – R (Novice)
      • Social Science
    • Undergraduate Degree Programs
    • Graduate Degree Programs
    • Massive Open Online Courses (MOOC)
  • Partnerships
    • Higher Education
    • Enterprise
  • Resources
    • About Us
    • Blog
    • Word Of The Week
    • News and Announcements
    • Newsletter signup
    • Glossary
    • Statistical Symbols
    • FAQs & Knowledge Base
    • Testimonials
    • Test Yourself
Menu
  • Courses
    • See All Courses
    • Calendar
    • Intro stats for college credit
    • Faculty
    • Group training
    • Credit & Credentialing
    • Teach With Us
  • Programs/Degrees
    • Certificates
      • Analytics for Data Science
      • Biostatistics
      • Programming For Data Science – Python (Experienced)
      • Programming For Data Science – Python (Novice)
      • Programming For Data Science – R (Experienced)
      • Programming For Data Science – R (Novice)
      • Social Science
    • Undergraduate Degree Programs
    • Graduate Degree Programs
    • Massive Open Online Courses (MOOC)
  • Partnerships
    • Higher Education
    • Enterprise
  • Resources
    • About Us
    • Blog
    • Word Of The Week
    • News and Announcements
    • Newsletter signup
    • Glossary
    • Statistical Symbols
    • FAQs & Knowledge Base
    • Testimonials
    • Test Yourself
Student Login

Blog

Home Blog Recidivism, and the Failure of AUC

Recidivism, and the Failure of AUC

On average, 40% – 50% of convicted criminals in the U.S. go on to commit another crime (“recidivate”) after they are released.  For nearly 20 years, court systems have used statistical and machine learning algorithms to predict the probability of recidivism, and to guide sentencing decisions, assignment to substance abuse treatment programs, and other aspects of prisoner case management. One of the most popular systems is the COMPAS software from Equivant (formerly Northpointe), which uses 173 variables to predict whether a defendant or prisoner will commit a further crime within two years.  

Racial Bias Alleged 

In 2016, ProPublica published a critique of COMPAS, which could be summed up in its title: 

“Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks.”

Specifically, according to a ProPublica review of experience with COMPAS in Florida (a study of 7000 defendants, as cited in Science Advances):

  • Black defendants who did not recidivate were incorrectly predicted to reoffend (False Alarm) at a rate of 44.9%, whereas the same error rate for White defendants was only 23.5%.
  • White defendants who did recidivate were incorrectly predicted to not reoffend (False Dismissal) at a rate of 47.7%, nearly twice the rate of their Black counterparts at 28.0%. 

Bottom line:  COMPAS seems to be biased against Black defendants.

COMPAS Defended

Subsequently, a three-person team of researchers (Flores et al), published a rejoinder that defended the COMPAS system.  Their 36-page report delved deep into different theories of test bias, but running through their analysis were two key points:

  • The “Area Under the Curve” (AUC) was a healthy 0.71 overall, indicating the COMPAS model has good predictive power
  •  The AUC was about the same for White defendants and Black defendants, indicating no unfairness or bias.

What is AUC?

The curve in “Area Under the Curve” is the Receiver Operating Characteristics (ROC) curve.  The steeper it is the better, so it became common to use the area under that curve as a measure of how well a statistical or machine learning model (or a medical diagnostic procedure) can distinguish between two classes, say 1’s and 0’s.  For example, defendants who re-offend (1’s) and ones who don’t (0’s). The ROC plots two quantities:

  • Sensitivity (also called recall in machine learning):  The proportion of 1’s (re-offenders) the model correctly identifies; plotted on the y-axis
  • Specificity:  The proportion of 0’s (non-re-offenders) the model correctly identifies (plotted on the x-axis, in reverse: 1 on the left and 0 on the right)

Specifically, the model ranks all the records by probability of being a 1, with the most probable 1’s ranked highest.  To plot the curve, proceed through the ranked records and, at each record, calculate cumulative sensitivity and specificity to that point.  A very well-performing model will catch lots of 1’s before it starts misidentifying 0’s as 1’s; it will climb steeply and hug the upper-left corner of the plot.  Misidentifying 0’s as 1’s will shrink the curvature and bring the ROC closer to the straight diagonal line; so will misidentifying 1’s as 0’s.


Figure 1:  Receiver Operating Characteristics (ROC) curve.

The closer the ROC curve lies to the upper left corner, the closer the AUC is to 1, and the greater the discriminatory power.  The diagonal line represents a completely ineffective model, no better than random guessing.  It has an AUC of 0.5.

AUC is perhaps the most commonly used metric of a model’s discriminatory power.

Resolving the Puzzle – All Errors are Not Equal

How could you end up with the bias uncovered by ProPublica when the model performs equally-well for both Black and White defendants, at least according to AUC?  The answer is that there are two types of error:  (1) predicting a defendant will re-offend when they don’t, and (2) predicting they won’t re-offend when they do.  AUC treats them the same, considering them just generic “errors.”

For Black defendants, COMPAS made more of the first error and fewer of the second.  For White defendants, COMPAS made more of the second error and fewer of the first.  The two roughly balanced each other out in terms of total errors, resulting in AUCs that were roughly the same for White and Black defendants.  

Summary

Assessing model performance with a single numerical metric, AUC, concealed the fact that the model erred in different ways for Black and White defendants to the great disadvantage of Black defendants. It could be argued that the model is so bad, that perhaps defendants, at least Black defendants, might have been better off with no model.

Recent Posts

  • Oct 6: Ethical AI: Darth Vader and the Cowardly Lion
    /
    0 Comments
  • Oct 19: Data Literacy – The Chainsaw Case
    /
    0 Comments
  • Data Literacy – The Chainsaw Case
    /
    0 Comments

About Statistics.com

Statistics.com offers academic and professional education in statistics, analytics, and data science at beginner, intermediate, and advanced levels of instruction. Statistics.com is a part of Elder Research, a data science consultancy with 25 years of experience in data analytics.

 The Institute for Statistics Education is certified to operate by the State Council of Higher Education for Virginia (SCHEV)

Our Links

  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team
  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team

Social Networks

Facebook Twitter Youtube Linkedin

Contact

The Institute for Statistics Education
2107 Wilson Blvd
Suite 850 
Arlington, VA 22201
(571) 281-8817

ourcourses@statistics.com

  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team

© Copyright 2023 - Statistics.com, LLC | All Rights Reserved | Privacy Policy | Terms of Use

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

Accept