Skip to content

Explore Courses | Elder Research | Contact | LMS Login

Statistics.com Logo
  • Courses
    • See All Courses
    • Calendar
    • Intro stats for college credit
    • Faculty
    • Group training
    • Credit & Credentialing
    • Teach With Us
  • Programs/Degrees
    • Certificates
      • Analytics for Data Science
      • Biostatistics
      • Programming For Data Science – Python (Experienced)
      • Programming For Data Science – Python (Novice)
      • Programming For Data Science – R (Experienced)
      • Programming For Data Science – R (Novice)
      • Social Science
    • Undergraduate Degree Programs
    • Graduate Degree Programs
    • Massive Open Online Courses (MOOC)
  • Partnerships
    • Higher Education
    • Enterprise
  • Resources
    • About Us
    • Blog
    • Word Of The Week
    • News and Announcements
    • Newsletter signup
    • Glossary
    • Statistical Symbols
    • FAQs & Knowledge Base
    • Testimonials
    • Test Yourself
Menu
  • Courses
    • See All Courses
    • Calendar
    • Intro stats for college credit
    • Faculty
    • Group training
    • Credit & Credentialing
    • Teach With Us
  • Programs/Degrees
    • Certificates
      • Analytics for Data Science
      • Biostatistics
      • Programming For Data Science – Python (Experienced)
      • Programming For Data Science – Python (Novice)
      • Programming For Data Science – R (Experienced)
      • Programming For Data Science – R (Novice)
      • Social Science
    • Undergraduate Degree Programs
    • Graduate Degree Programs
    • Massive Open Online Courses (MOOC)
  • Partnerships
    • Higher Education
    • Enterprise
  • Resources
    • About Us
    • Blog
    • Word Of The Week
    • News and Announcements
    • Newsletter signup
    • Glossary
    • Statistical Symbols
    • FAQs & Knowledge Base
    • Testimonials
    • Test Yourself
Student Login

Blog

Home Blog BOOTSTRAP

BOOTSTRAP

I used the term in my message about bagging and several people asked for a review of the bootstrap. Put simply, to bootstrap a dataset is to draw a resample from the data, randomly and with replacement.

For the original sample, a statistic or estimate is calculated, then that same statistic or estimate is recalculated for each bootstrap resample. The resamples need not be smaller than the original sample, as in the diagram, often they are of the same size. The distribution of bootstrapped statistics or estimates is then compared to the original and is used to assess potential error in the estimate or statistic derived from the original sample.

An especially common use of the bootstrap is in machine learning prediction methods. In one application, predictions from multiple bootstrap samples are aggregated, e.g. by averaging.

The bootstrap is about to enter its second half-century; the first published bootstrap example was in 1969.

Julian Simon (left), the University of Maryland economist and demographer, included a bootstrap sample size illustration in his 1969 text Basic Research Methods for Social Science, among a compendium of Monte Carlo techniques for inference.

The bootstrap was given its name, and full statistical foundations, in 1979, by the Stanford statistician Bradley Efron (right). Of course, only with the widespread availability of computing power did the bootstrap gain popularity.

For many mathematical minds, the crude simplicity of the bootstrap was offensive. Why does it work? Consider a slightly modified version of the bootstrap algorithm:

  1.  Replicate the original sample (say) thousands of times. Now you have a “population” to draw from. Although synthetic, it is also embodies everything you know about the population that gave rise to your sample.
  2.  Draw lots of samples from this synthetic population without replacement.
  3. For each such sample, recalculate the statistic or estimate of interest.

Some reflection will show that this is functionally equivalent to the bootstrap. But how does it compare to classical formula-based inference based on normal approximations?

The latter, instead of replicating the original sample thousands of times, substitutes an infinite population with a normal distribution, based on the sample mean and standard deviation.

Where the data are normally distributed and well-behaved, the classical approach works well, is less “lumpy” than the bootstrap, and provides better coverage in the extremes.

Much real-world data, though, is far from normally-distributed – the bootstrap works much better in such cases.

Recent Posts

  • Oct 6: Ethical AI: Darth Vader and the Cowardly Lion
    /
    0 Comments
  • Oct 19: Data Literacy – The Chainsaw Case
    /
    0 Comments
  • Data Literacy – The Chainsaw Case
    /
    0 Comments

About Statistics.com

Statistics.com offers academic and professional education in statistics, analytics, and data science at beginner, intermediate, and advanced levels of instruction. Statistics.com is a part of Elder Research, a data science consultancy with 25 years of experience in data analytics.

 The Institute for Statistics Education is certified to operate by the State Council of Higher Education for Virginia (SCHEV)

Our Links

  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team
  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team

Social Networks

Facebook Twitter Youtube Linkedin

Contact

The Institute for Statistics Education
2107 Wilson Blvd
Suite 850 
Arlington, VA 22201
(571) 281-8817

ourcourses@statistics.com

  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team

© Copyright 2023 - Statistics.com, LLC | All Rights Reserved | Privacy Policy | Terms of Use

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

Accept