Skip to content

Explore Courses | Elder Research | Contact | LMS Login

Statistics.com
  • Curriculum
    • Curriculum
    • About Us
    • Testimonials
    • Management Team
    • Faculty Search
    • Teach With Us
    • Credit & Credentialing
  • Courses
    • Explore Courses
    • Course Calendar
    • About Our Courses
    • Course Tour
    • Test Yourself!
  • Mastery Series
    • Mastery Series Program
    • Bayesian Statistics
    • Business Analytics
    • Healthcare Analytics
    • Marketing Analytics
    • Operations Research
    • Predictive Analytics
    • Python for Analytics
    • R Programming
    • Rasch & IRT
    • Spatial Statistics
    • Statistical Modeling
    • Survey Statistics
    • Text Mining and Analytics
  • Certificates
    • Certificate Program
    • Analytics for Data Science
    • Biostatistics
    • Programming for Data Science – R (Novice)
    • Programming for Data Science – R (Experienced)
    • Programming for Data Science – Python (Novice)
    • Programming for Data Science – Python (Experienced)
    • Social Science
  • Degrees
    • Degree Programs
    • Computational Data Analytics Certificate of Graduate Study from Rowan University
    • Health Data Management Certificate of Graduate Study from Rowan University
    • Data Science Analytics Master’s Degree from Thomas Edison State University (TESU)
    • Data Science Analytics Bachelor’s Degree – TESU
    • Mathematics with Predictive Modeling Emphasis BS from Bellevue University
  • Enterprise
    • Organizations
    • Higher Education
  • Resources
    • Blog
    • FAQs & Knowledge Base
    • Glossary
    • Site Map
    • Statistical Symbols
    • Weekly Brief Newsletter Signup
    • Word of the Week
Menu Close
  • Curriculum
    • Curriculum
    • About Us
    • Testimonials
    • Management Team
    • Faculty Search
    • Teach With Us
    • Credit & Credentialing
  • Courses
    • Explore Courses
    • Course Calendar
    • About Our Courses
    • Course Tour
    • Test Yourself!
  • Mastery Series
    • Mastery Series Program
    • Bayesian Statistics
    • Business Analytics
    • Healthcare Analytics
    • Marketing Analytics
    • Operations Research
    • Predictive Analytics
    • Python for Analytics
    • R Programming
    • Rasch & IRT
    • Spatial Statistics
    • Statistical Modeling
    • Survey Statistics
    • Text Mining and Analytics
  • Certificates
    • Certificate Program
    • Analytics for Data Science
    • Biostatistics
    • Programming for Data Science – R (Novice)
    • Programming for Data Science – R (Experienced)
    • Programming for Data Science – Python (Novice)
    • Programming for Data Science – Python (Experienced)
    • Social Science
  • Degrees
    • Degree Programs
    • Computational Data Analytics Certificate of Graduate Study from Rowan University
    • Health Data Management Certificate of Graduate Study from Rowan University
    • Data Science Analytics Master’s Degree from Thomas Edison State University (TESU)
    • Data Science Analytics Bachelor’s Degree – TESU
    • Mathematics with Predictive Modeling Emphasis BS from Bellevue University
  • Enterprise
    • Organizations
    • Higher Education
  • Resources
    • Blog
    • FAQs & Knowledge Base
    • Glossary
    • Site Map
    • Statistical Symbols
    • Weekly Brief Newsletter Signup
    • Word of the Week

Blog

OVERFIT

  • February 14, 2018
  • , 3:20 pm
As applied to statistical models - "overfit" means the model is too accurate, and fitting noise, not signal. For example, the complex polynomial curve in the figure fits the data with no error, but you would not want to rely on it to predict accurately for new data:

Standard linear regression is less prone to overfitting problems; the structured linear relationship does not allow the model to “bend” to accommodate noise. However, even with linear regression the resulting line (or multi-dimensional relationship) is tailored to provide the best fit to the data that it was fit to; it will probably not do as well with another set of data drawn from the same population.

Machine learning algorithms like neural nets and decision trees are the most vulnerable to overfitting. Left to their own devices, they can fit every point in the data just like the plot above – i.e. completely modeling all the noise in the data. So the question of how to stop algorithms like this at an appropriate stage looms large in their implementation.

A key tool in keeping overfitting under control is a holdout sample – a dataset drawn from the same population that was used to fit the model. This is a common procedure, particularly with problems involving plentiful (“big”) data. (Learn more in our online course Predictive Analytics 1.)

Subscribe to the Blog

You have Successfully Subscribed!

By submitting your information, you agree to receive email communications from statistics.com. All information submitted is subject to our privacy policy. You may opt out of receiving communications at any time.

Categories

Recent Posts

  • Word of the Week – Ruin Theory March 4, 2021
  • Puzzle – Gambler’s Ruin March 4, 2021
  • As an Aspiring Data Scientist, What Do I Really Need to Know About Statistics? March 4, 2021

About Statistics.com

Statistics.com offers academic and professional education in statistics, analytics, and data science at beginner, intermediate, and advanced levels of instruction. Statistics.com is a part of Elder Research, a data science consultancy with 25 years of experience in data analytics.

Latest Blogs

  • Word of the Week – Ruin Theory
    March 4, 2021/
    0 Comments
  • Puzzle – Gambler’s Ruin
    March 4, 2021/
    0 Comments
  • As an Aspiring Data Scientist, What Do I Really Need to Know About Statistics?
    March 4, 2021/
    0 Comments

Social Networks

Linkedin
Twitter
Facebook
Youtube

Contact

The Institute for Statistics Education
4075 Wilson Blvd, 8th Floor
Arlington, VA 22203
(571) 281-8817

ourcourses@statistics.com

© Copyright 2021 - Statistics.com, LLC | All Rights Reserved | Privacy Policy | Terms of Use

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

Accept