Skip to content

Explore Courses | Elder Research | Contact | LMS Login

Statistics.com
  • Curriculum
    • Curriculum
    • About Us
    • Testimonials
    • Management Team
    • Faculty Search
    • Teach With Us
    • Credit & Credentialing
  • Courses
    • Explore Courses
    • Course Calendar
    • About Our Courses
    • Course Tour
    • Test Yourself!
  • Mastery Series
    • Mastery Series Program
    • Bayesian Statistics
    • Business Analytics
    • Healthcare Analytics
    • Marketing Analytics
    • Operations Research
    • Predictive Analytics
    • Python for Analytics
    • R Programming
    • Rasch & IRT
    • Spatial Statistics
    • Statistical Modeling
    • Survey Statistics
    • Text Mining and Analytics
  • Certificates
    • Certificate Program
    • Analytics for Data Science
    • Biostatistics
    • Programming for Data Science – R (Novice)
    • Programming for Data Science – R (Experienced)
    • Programming for Data Science – Python (Novice)
    • Programming for Data Science – Python (Experienced)
    • Social Science
  • Degrees
    • Degree Programs
    • Computational Data Analytics Certificate of Graduate Study from Rowan University
    • Health Data Management Certificate of Graduate Study from Rowan University
    • Data Science Analytics Master’s Degree from Thomas Edison State University (TESU)
    • Data Science Analytics Bachelor’s Degree – TESU
    • Mathematics with Predictive Modeling Emphasis BS from Bellevue University
  • Enterprise
    • Organizations
    • Higher Education
  • Resources
    • Blog
    • FAQs & Knowledge Base
    • Glossary
    • Site Map
    • Statistical Symbols
    • Weekly Brief Newsletter Signup
    • Word of the Week
Menu Close
  • Curriculum
    • Curriculum
    • About Us
    • Testimonials
    • Management Team
    • Faculty Search
    • Teach With Us
    • Credit & Credentialing
  • Courses
    • Explore Courses
    • Course Calendar
    • About Our Courses
    • Course Tour
    • Test Yourself!
  • Mastery Series
    • Mastery Series Program
    • Bayesian Statistics
    • Business Analytics
    • Healthcare Analytics
    • Marketing Analytics
    • Operations Research
    • Predictive Analytics
    • Python for Analytics
    • R Programming
    • Rasch & IRT
    • Spatial Statistics
    • Statistical Modeling
    • Survey Statistics
    • Text Mining and Analytics
  • Certificates
    • Certificate Program
    • Analytics for Data Science
    • Biostatistics
    • Programming for Data Science – R (Novice)
    • Programming for Data Science – R (Experienced)
    • Programming for Data Science – Python (Novice)
    • Programming for Data Science – Python (Experienced)
    • Social Science
  • Degrees
    • Degree Programs
    • Computational Data Analytics Certificate of Graduate Study from Rowan University
    • Health Data Management Certificate of Graduate Study from Rowan University
    • Data Science Analytics Master’s Degree from Thomas Edison State University (TESU)
    • Data Science Analytics Bachelor’s Degree – TESU
    • Mathematics with Predictive Modeling Emphasis BS from Bellevue University
  • Enterprise
    • Organizations
    • Higher Education
  • Resources
    • Blog
    • FAQs & Knowledge Base
    • Glossary
    • Site Map
    • Statistical Symbols
    • Weekly Brief Newsletter Signup
    • Word of the Week

Blog

Home » Blog » Statistics » Biostatistics » The False Alarm Conundrum

The False Alarm Conundrum

  • December 14, 2018
  • , 9:39 pm

False alarms are one of the most poorly understood problems in applied statistics and biostatistics. The fundamental problem is the wide application of a statistical or diagnostic test in search of something that is relatively rare. Consider the Apple Watch’s new feature that detects atrial fibrillation (afib).

Among people with irregular heartbeats, Apple claims a 97% success rate in identifying the condition. This sounds good, but consider all the people who do not have atrial fibrillation. In a test, 20% of the Watch’s afib alarms were not confirmed by an EKG patch. Apple claims that most of those cases did, in fact, have some heartbeat irregularity that required attention, so let’s assume a false positive rate of only 1%.

But very few Apple watch wearers, a relatively young crowd, have atrial fibrillation. The vast majority do not, and are at risk for false alarms. Specifically, less than 0.2% of the population under 55 has atrial fibrillation. Consider 1000 people. Two (0.2%) are likely to have afib, and the Apple Watch is pretty certain to catch them. Unfortunately, even using the very low false alarm rate estimate of 1%, there will be 10 false alarms – healthy people who the Watch signals as having afib. Of the 12 alarms, 10 (83.3%) were false. Put another way, if your watch tells you you are at risk for afib, the probability is 0.83 that it’s a false alarm.

The false alarm problem is thus mainly a function of

  • the base rate of the phenomenon of interest, and

  • the model’s accuracy with true negatives.

If the phenomenon is very rare, then even a very good discriminator will produce many false alarms for each true positive, since the latter are so rare and the normal cases are so plentiful. And if the model is not good at ruling out true negatives, they will be mislabeled as positives. It is noteworthy that the model’s overall accuracy, which is usually the first performance metric people look at, is not very relevant for the problem of false positives.

What are the consequences of excessive false alarms? In this case, increased anxiety, certainly. Increased costs of additional unnecessary testing. And in a few cases, if a negative case somehow survives additional tests, increased risk from more invasive tests or treatment – more people undergoing blood-thinning treatment with warfarin (a drug therapy for afib), or heart catheterization for diagnosis.

The problem also crops up in predictive models for identifying financial fraud, malicious activity in networks, employees who are likely to leave, loan defaults, and a host of similar applications. Because these events are relatively rare (out of all the cases under consideration), the predictive model typically uses a discriminator with a low bar – the probability of being a defaulter, violator, fraudster, etc. does not have to be set very high for the model to attach positive score to the person or entity. The model may still be very useful for sorting, but a naive user may overestimate the probability that the person is a fraudster, violator, etc. This can result in poor decisions and can harm individuals who are mistakenly labeled.

Medical societies, insurance companies and public health agencies have been revising guidance about routine screening exams as a result of this false positive problem, resulting in conflicting guidance from different organizations. Take mammograms, for example. The National Comprehensive Cancer Network recommends a mammogram annually starting at age 40 (the traditional guidance), while the U.S. Preventive Services Task Force (a recently appointed panel of experts reporting the the US Dept. of Health and Human Services) recommends a mammogram every two years starting at age 50.

Under-reaction to alarms is also a problem. If there are many alarms and most turn out to be false, as with the boy who cried wolf, they may be ignored. This was a major problem in the early days of using statistical and machine learning algorithms to detect malicious network activity.

The problem of false alarms produced by statistical and machine learning algorithms diminishes over time for two reasons:

  1. As research proceeds, algorithms get better (often by tweaking, tuning and testing existing algorithms)

  2. As data accumulates, the algorithms have better information to work with

Of course, the data must remain accessible – when my airline-affiliated credit card changed banks, the rate of false credit card fraud alarms skyrocketed. Either the new bank’s algorithms needed time to train on the data, or some historical data was unavailable to the new bank, or both.

 

Subscribe to the Blog

You have Successfully Subscribed!

Categories

Recent Posts

  • Dec 14: Statistics in Practice December 11, 2020
  • PUZZLE OF THE WEEK – School in the Pandemic December 11, 2020
  • From Kaggle to Cancel: The Culture of AI December 11, 2020

About Statistics.com

Statistics.com offers academic and professional education in statistics, analytics, and data science at beginner, intermediate, and advanced levels of instruction. Statistics.com is a part of Elder Research, a data science consultancy with 25 years of experience in data analytics.

Latest Blogs

  • Dec 14: Statistics in Practice
    December 11, 2020/
    0 Comments
  • PUZZLE OF THE WEEK – School in the Pandemic
    December 11, 2020/
    0 Comments
  • From Kaggle to Cancel: The Culture of AI
    December 11, 2020/
    0 Comments

Social Networks

Linkedin
Twitter
Facebook
Youtube

Contact

The Institute for Statistics Education
4075 Wilson Blvd, 8th Floor
Arlington, VA 22203
(571) 281-8817

ourcourses@statistics.com

© Copyright 2021 - Statistics.com, LLC | All Rights Reserved | Privacy Policy | Terms of Use

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

Accept