Skip to content

Explore Courses | Elder Research | Contact | LMS Login

Statistics.com: Data Science, Analytics & Statistics Courses
  • Curriculum
    • Curriculum
    • About Us
    • Testimonials
    • Management Team
    • Faculty Search
    • Teach With Us
    • Credit & Credentialing
  • Courses
    • Explore Courses
    • Course Calendar
    • About Our Courses
    • Course Tour
    • Test Yourself!
  • Mastery Series
    • Mastery Series Program
    • Bayesian Statistics
    • Business Analytics
    • Healthcare Analytics
    • Marketing Analytics
    • Operations Research
    • Predictive Analytics
    • Python for Analytics
    • R Programming
    • Rasch & IRT
    • Spatial Statistics
    • Statistical Modeling
    • Survey Statistics
    • Text Mining and Analytics
  • Certificates
    • Certificate Program
    • Analytics for Data Science
    • Biostatistics
    • Programming for Data Science – R (Novice)
    • Programming for Data Science – R (Experienced)
    • Programming for Data Science – Python (Novice)
    • Programming for Data Science – Python (Experienced)
    • Social Science
  • Degrees
    • Degree Programs
    • Computational Data Analytics Certificate of Graduate Study from Rowan University
    • Health Data Management Certificate of Graduate Study from Rowan University
    • Data Science Analytics Master’s Degree from Thomas Edison State University (TESU)
    • Data Science Analytics Bachelor’s Degree – TESU
    • Mathematics with Predictive Modeling Emphasis BS from Bellevue University
  • Enterprise
    • Organizations
    • Higher Education
  • Resources
    • Blog
    • FAQs & Knowledge Base
    • Glossary
    • Site Map
    • Statistical Symbols
    • Weekly Brief Newsletter Signup
    • Word of the Week
Menu Close
  • Curriculum
    • Curriculum
    • About Us
    • Testimonials
    • Management Team
    • Faculty Search
    • Teach With Us
    • Credit & Credentialing
  • Courses
    • Explore Courses
    • Course Calendar
    • About Our Courses
    • Course Tour
    • Test Yourself!
  • Mastery Series
    • Mastery Series Program
    • Bayesian Statistics
    • Business Analytics
    • Healthcare Analytics
    • Marketing Analytics
    • Operations Research
    • Predictive Analytics
    • Python for Analytics
    • R Programming
    • Rasch & IRT
    • Spatial Statistics
    • Statistical Modeling
    • Survey Statistics
    • Text Mining and Analytics
  • Certificates
    • Certificate Program
    • Analytics for Data Science
    • Biostatistics
    • Programming for Data Science – R (Novice)
    • Programming for Data Science – R (Experienced)
    • Programming for Data Science – Python (Novice)
    • Programming for Data Science – Python (Experienced)
    • Social Science
  • Degrees
    • Degree Programs
    • Computational Data Analytics Certificate of Graduate Study from Rowan University
    • Health Data Management Certificate of Graduate Study from Rowan University
    • Data Science Analytics Master’s Degree from Thomas Edison State University (TESU)
    • Data Science Analytics Bachelor’s Degree – TESU
    • Mathematics with Predictive Modeling Emphasis BS from Bellevue University
  • Enterprise
    • Organizations
    • Higher Education
  • Resources
    • Blog
    • FAQs & Knowledge Base
    • Glossary
    • Site Map
    • Statistical Symbols
    • Weekly Brief Newsletter Signup
    • Word of the Week

Blog

BOOTSTRAP

  • July 3, 2018
  • , 5:04 pm
I used the term in my message about bagging and several people asked for a review of the bootstrap. Put simply, to bootstrap a dataset is to draw a resample from the data, randomly and with replacement.

For the original sample, a statistic or estimate is calculated, then that same statistic or estimate is recalculated for each bootstrap resample. The resamples need not be smaller than the original sample, as in the diagram, often they are of the same size. The distribution of bootstrapped statistics or estimates is then compared to the original and is used to assess potential error in the estimate or statistic derived from the original sample.

An especially common use of the bootstrap is in machine learning prediction methods. In one application, predictions from multiple bootstrap samples are aggregated, e.g. by averaging.

 

The bootstrap is about to enter its second half-century; the first published bootstrap example was in 1969.

Julian Simon (left), the University of Maryland economist and demographer, included a bootstrap sample size illustration in his 1969 text Basic Research Methods for Social Science, among a compendium of Monte Carlo techniques for inference.

The bootstrap was given its name, and full statistical foundations, in 1979, by the Stanford statistician Bradley Efron (right). Of course, only with the widespread availability of computing power did the bootstrap gain popularity. 

For many mathematical minds, the crude simplicity of the bootstrap was offensive. Why does it work? Consider a slightly modified version of the bootstrap algorithm:

 

  1.  Replicate the original sample (say) thousands of times. Now you have a “population” to draw from. Although synthetic, it is also embodies everything you know about the population that gave rise to your sample.
  2.  Draw lots of samples from this synthetic population without replacement.
  3. For each such sample, recalculate the statistic or estimate of interest.

Some reflection will show that this is functionally equivalent to the bootstrap. But how does it compare to classical formula-based inference based on normal approximations?

 

The latter, instead of replicating the original sample thousands of times, substitutes an infinite population with a normal distribution, based on the sample mean and standard deviation.

Where the data are normally distributed and well-behaved, the classical approach works well, is less “lumpy” than the bootstrap, and provides better coverage in the extremes.

 

Much real-world data, though, is far from normally-distributed – the bootstrap works much better in such cases.

 

Subscribe to the Blog

You have Successfully Subscribed!

By submitting your information, you agree to receive email communications from statistics.com. All information submitted is subject to our privacy policy. You may opt out of receiving communications at any time.

Categories

Recent Posts

  • March 9: Statistics and Data Science in Practice March 7, 2021
  • Feb 23: Statistics and Data Science in Practice March 5, 2021
  • Word of the Week – Ruin Theory March 4, 2021

About Statistics.com

Statistics.com offers academic and professional education in statistics, analytics, and data science at beginner, intermediate, and advanced levels of instruction. Statistics.com is a part of Elder Research, a data science consultancy with 25 years of experience in data analytics.

Latest Blogs

  • March 9: Statistics and Data Science in Practice
    March 7, 2021/
    0 Comments
  • Feb 23: Statistics and Data Science in Practice
    March 5, 2021/
    0 Comments
  • Word of the Week – Ruin Theory
    March 4, 2021/
    0 Comments

Social Networks

Linkedin-in
Twitter
Facebook-f
Youtube

Contact

The Institute for Statistics Education
4075 Wilson Blvd, 8th Floor
Arlington, VA 22203
(571) 281-8817

ourcourses@statistics.com

© Copyright 2021 - Statistics.com, LLC | All Rights Reserved | Privacy Policy | Terms of Use

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

Accept