Skip to content

Explore Courses | Elder Research | Contact | LMS Login

Statistics.com Logo
  • Courses
    • See All Courses
    • Calendar
    • Intro stats for college credit
    • Faculty
    • Group training
    • Credit & Credentialing
    • Teach With Us
  • Programs/Degrees
    • Certificates
      • Analytics for Data Science
      • Biostatistics
      • Programming For Data Science – Python (Experienced)
      • Programming For Data Science – Python (Novice)
      • Programming For Data Science – R (Experienced)
      • Programming For Data Science – R (Novice)
      • Social Science
    • Undergraduate Degree Programs
    • Graduate Degree Programs
    • Massive Open Online Courses (MOOC)
  • Partnerships
    • Higher Education
    • Enterprise
  • Resources
    • About Us
    • Blog
    • Word Of The Week
    • News and Announcements
    • Newsletter signup
    • Glossary
    • Statistical Symbols
    • FAQs & Knowledge Base
    • Testimonials
    • Test Yourself
Menu
  • Courses
    • See All Courses
    • Calendar
    • Intro stats for college credit
    • Faculty
    • Group training
    • Credit & Credentialing
    • Teach With Us
  • Programs/Degrees
    • Certificates
      • Analytics for Data Science
      • Biostatistics
      • Programming For Data Science – Python (Experienced)
      • Programming For Data Science – Python (Novice)
      • Programming For Data Science – R (Experienced)
      • Programming For Data Science – R (Novice)
      • Social Science
    • Undergraduate Degree Programs
    • Graduate Degree Programs
    • Massive Open Online Courses (MOOC)
  • Partnerships
    • Higher Education
    • Enterprise
  • Resources
    • About Us
    • Blog
    • Word Of The Week
    • News and Announcements
    • Newsletter signup
    • Glossary
    • Statistical Symbols
    • FAQs & Knowledge Base
    • Testimonials
    • Test Yourself
Student Login

Blog

Home Blog The False Alarm Conundrum

The False Alarm Conundrum

False alarms are one of the most poorly understood problems in applied statistics and biostatistics. The fundamental problem is the wide application of a statistical or diagnostic test in search of something that is relatively rare. Consider the Apple Watch’s new feature that detects atrial fibrillation (afib).

Among people with irregular heartbeats, Apple claims a 97% success rate in identifying the condition. This sounds good, but consider all the people who do not have atrial fibrillation. In a test, 20% of the Watch’s afib alarms were not confirmed by an EKG patch. Apple claims that most of those cases did, in fact, have some heartbeat irregularity that required attention, so let’s assume a false positive rate of only 1%.

But very few Apple watch wearers, a relatively young crowd, have atrial fibrillation. The vast majority do not, and are at risk for false alarms. Specifically, less than 0.2% of the population under 55 has atrial fibrillation. Consider 1000 people. Two (0.2%) are likely to have afib, and the Apple Watch is pretty certain to catch them. Unfortunately, even using the very low false alarm rate estimate of 1%, there will be 10 false alarms – healthy people who the Watch signals as having afib. Of the 12 alarms, 10 (83.3%) were false. Put another way, if your watch tells you you are at risk for afib, the probability is 0.83 that it’s a false alarm.

The false alarm problem is thus mainly a function of

  • the base rate of the phenomenon of interest, and

  • the model’s accuracy with true negatives.

If the phenomenon is very rare, then even a very good discriminator will produce many false alarms for each true positive, since the latter are so rare and the normal cases are so plentiful. And if the model is not good at ruling out true negatives, they will be mislabeled as positives. It is noteworthy that the model’s overall accuracy, which is usually the first performance metric people look at, is not very relevant for the problem of false positives.

What are the consequences of excessive false alarms? In this case, increased anxiety, certainly. Increased costs of additional unnecessary testing. And in a few cases, if a negative case somehow survives additional tests, increased risk from more invasive tests or treatment – more people undergoing blood-thinning treatment with warfarin (a drug therapy for afib), or heart catheterization for diagnosis.

The problem also crops up in predictive models for identifying financial fraud, malicious activity in networks, employees who are likely to leave, loan defaults, and a host of similar applications. Because these events are relatively rare (out of all the cases under consideration), the predictive model typically uses a discriminator with a low bar – the probability of being a defaulter, violator, fraudster, etc. does not have to be set very high for the model to attach positive score to the person or entity. The model may still be very useful for sorting, but a naive user may overestimate the probability that the person is a fraudster, violator, etc. This can result in poor decisions and can harm individuals who are mistakenly labeled.

Medical societies, insurance companies and public health agencies have been revising guidance about routine screening exams as a result of this false positive problem, resulting in conflicting guidance from different organizations. Take mammograms, for example. The National Comprehensive Cancer Network recommends a mammogram annually starting at age 40 (the traditional guidance), while the U.S. Preventive Services Task Force (a recently appointed panel of experts reporting the the US Dept. of Health and Human Services) recommends a mammogram every two years starting at age 50.

Under-reaction to alarms is also a problem. If there are many alarms and most turn out to be false, as with the boy who cried wolf, they may be ignored. This was a major problem in the early days of using statistical and machine learning algorithms to detect malicious network activity.

The problem of false alarms produced by statistical and machine learning algorithms diminishes over time for two reasons:

  1. As research proceeds, algorithms get better (often by tweaking, tuning and testing existing algorithms)

  2. As data accumulates, the algorithms have better information to work with

Of course, the data must remain accessible – when my airline-affiliated credit card changed banks, the rate of false credit card fraud alarms skyrocketed. Either the new bank’s algorithms needed time to train on the data, or some historical data was unavailable to the new bank, or both.

Recent Posts

  • Oct 6: Ethical AI: Darth Vader and the Cowardly Lion
    /
    0 Comments
  • Oct 19: Data Literacy – The Chainsaw Case
    /
    0 Comments
  • Data Literacy – The Chainsaw Case
    /
    0 Comments

About Statistics.com

Statistics.com offers academic and professional education in statistics, analytics, and data science at beginner, intermediate, and advanced levels of instruction. Statistics.com is a part of Elder Research, a data science consultancy with 25 years of experience in data analytics.

 The Institute for Statistics Education is certified to operate by the State Council of Higher Education for Virginia (SCHEV)

Our Links

  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team
  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team

Social Networks

Facebook Twitter Youtube Linkedin

Contact

The Institute for Statistics Education
2107 Wilson Blvd
Suite 850 
Arlington, VA 22201
(571) 281-8817

ourcourses@statistics.com

  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team

© Copyright 2023 - Statistics.com, LLC | All Rights Reserved | Privacy Policy | Terms of Use

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

Accept