Skip to content

Explore Courses | Elder Research | Contact | LMS Login

Statistics.com
  • Curriculum
    • Curriculum
    • About Us
    • Testimonials
    • Management Team
    • Faculty Search
    • Teach With Us
    • Credit & Credentialing
  • Courses
    • Explore Courses
    • Course Calendar
    • About Our Courses
    • Course Tour
    • Test Yourself!
  • Mastery Series
    • Mastery Series Program
    • Bayesian Statistics
    • Business Analytics
    • Healthcare Analytics
    • Marketing Analytics
    • Operations Research
    • Predictive Analytics
    • Python for Analytics
    • R Programming
    • Rasch & IRT
    • Spatial Statistics
    • Statistical Modeling
    • Survey Statistics
    • Text Mining and Analytics
  • Certificates
    • Certificate Program
    • Analytics for Data Science
    • Biostatistics
    • Programming for Data Science – R (Novice)
    • Programming for Data Science – R (Experienced)
    • Programming for Data Science – Python (Novice)
    • Programming for Data Science – Python (Experienced)
    • Social Science
  • Degrees
    • Degree Programs
    • Computational Data Analytics Certificate of Graduate Study from Rowan University
    • Health Data Management Certificate of Graduate Study from Rowan University
    • Data Science Analytics Master’s Degree from Thomas Edison State University (TESU)
    • Data Science Analytics Bachelor’s Degree – TESU
    • Mathematics with Predictive Modeling Emphasis BS from Bellevue University
  • Enterprise
    • Organizations
    • Higher Education
  • Resources
    • Blog
    • FAQs & Knowledge Base
    • Glossary
    • Site Map
    • Statistical Symbols
    • Weekly Brief Newsletter Signup
    • Word of the Week
Menu Close
  • Curriculum
    • Curriculum
    • About Us
    • Testimonials
    • Management Team
    • Faculty Search
    • Teach With Us
    • Credit & Credentialing
  • Courses
    • Explore Courses
    • Course Calendar
    • About Our Courses
    • Course Tour
    • Test Yourself!
  • Mastery Series
    • Mastery Series Program
    • Bayesian Statistics
    • Business Analytics
    • Healthcare Analytics
    • Marketing Analytics
    • Operations Research
    • Predictive Analytics
    • Python for Analytics
    • R Programming
    • Rasch & IRT
    • Spatial Statistics
    • Statistical Modeling
    • Survey Statistics
    • Text Mining and Analytics
  • Certificates
    • Certificate Program
    • Analytics for Data Science
    • Biostatistics
    • Programming for Data Science – R (Novice)
    • Programming for Data Science – R (Experienced)
    • Programming for Data Science – Python (Novice)
    • Programming for Data Science – Python (Experienced)
    • Social Science
  • Degrees
    • Degree Programs
    • Computational Data Analytics Certificate of Graduate Study from Rowan University
    • Health Data Management Certificate of Graduate Study from Rowan University
    • Data Science Analytics Master’s Degree from Thomas Edison State University (TESU)
    • Data Science Analytics Bachelor’s Degree – TESU
    • Mathematics with Predictive Modeling Emphasis BS from Bellevue University
  • Enterprise
    • Organizations
    • Higher Education
  • Resources
    • Blog
    • FAQs & Knowledge Base
    • Glossary
    • Site Map
    • Statistical Symbols
    • Weekly Brief Newsletter Signup
    • Word of the Week

Blog

Alaskan Generosity

  • February 15, 2019
  • , 3:57 pm

People in Alaska are extraordinarily generous – that’s what a predictive model showed, when applied to a charitable organization’s donor list. A closer examination revealed a flaw – while the original data was for all 50 states, the model’s training data for Alaska included donors, but excluded non-donors. The reason?

The data was 99% non-donors, and predictive models like to work with more balanced data. (This exampleis from the excellent blogs at Elder Research.) So the analyst used a standard technique – using all the donors, but downsampling the prevalent non-donor class so it was more in line with the number of donors. However, rather than downsampling randomly, the analyst did so by ordering the list by zipcode and moving down the list, selecting every nth case. The selection quota ended up sampling from all but one state – it failed to reach Alaska, whose zipcodes all begin with 99. As a result, there were no non-donors from Alaska represented in the data, only donors. The CART (decision tree) algorithm thus found an excellent rule for dividing donors from non-donors: if person is from Alaska, classify as donor. In case you were wondering, the Chronicle of Philanthropy, in 2012, ranked Alaska #28 in charitable giving.

This is an obvious example of bias in the selection of a sample, easily corrected. But selection bias makes its way into analysis in far more subtle, unconscious and sometimes far-reaching ways. It is one of the factors that lies behind the reproducibility crisis in scientific research discussed by John Ioniddis in his aptly named paper Why Most Published Research Findings Are False.

The problem arises when the classic order of knowledge discovery is reversed. In a classical experiment, we have:

form hypothesis > collect data > confirm or reject hypothesis

But setting up an experiment and then collecting data is difficult, and, in some cases, impossible. An alternative is to consult pre-existing data, which often leads to a reversal of the order:

explore data > find something interesting > form and confirm hypothesis

With some reflection on the prevalence of random variability and noise, you can see that it is easy to fool yourself with this approach – if you look often and hard enough, you can find all manner of seemingly interesting phenomena lurking in the chance patterns of data. Or, in the words of the economist Ronald Coase:

“If you torture the data long enough, it will confess.”

Subscribe to the Blog

You have Successfully Subscribed!

By submitting your information, you agree to receive email communications from statistics.com. All information submitted is subject to our privacy policy. You may opt out of receiving communications at any time.

Categories

Recent Posts

  • Making Predictions Self-Fulfilling Prophecies February 19, 2021
  • Student Spotlight – Staci Taylor February 18, 2021
  • Word of the Week:  Bias February 1, 2021

About Statistics.com

Statistics.com offers academic and professional education in statistics, analytics, and data science at beginner, intermediate, and advanced levels of instruction. Statistics.com is a part of Elder Research, a data science consultancy with 25 years of experience in data analytics.

Latest Blogs

  • Making Predictions Self-Fulfilling Prophecies
    February 19, 2021/
    0 Comments
  • Student Spotlight – Staci Taylor
    February 18, 2021/
    0 Comments
  • Word of the Week:  Bias
    February 1, 2021/
    0 Comments

Social Networks

Linkedin
Twitter
Facebook
Youtube

Contact

The Institute for Statistics Education
4075 Wilson Blvd, 8th Floor
Arlington, VA 22203
(571) 281-8817

ourcourses@statistics.com

© Copyright 2021 - Statistics.com, LLC | All Rights Reserved | Privacy Policy | Terms of Use

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

Accept