Skip to content
Statistics logo
  • Courses
    • See All Courses
    • Calendar
    • Intro stats for college credit
    • Faculty
    • Group training
    • Credit & Credentialing
    • Teach With Us
  • Programs/Degrees
    • Certificates
      • Analytics for Data Science
      • Biostatistics
      • Programming For Data Science – Python (Experienced)
      • Programming For Data Science – Python (Novice)
      • Programming For Data Science – R (Experienced)
      • Programming For Data Science – R (Novice)
      • Social Science
    • Skillsets
      • Bayesian Statistics
      • Business Analytics
      • Healthcare Analytics
      • Marketing Analytics
      • Operations Research
      • Predictive Analytics
      • Python Analytics
      • R Programming Analytics
      • Rasch & IRT
      • Spatial Statistics
      • Survey Analysis
      • Text Mining Analytics
    • Undergraduate Degree Programs
    • Graduate Degree Programs
  • Partnerships
    • Higher Education
    • Enterprise
  • Resources
    • About Us
    • Blog
    • Word Of The Week
    • Newsletter signup
    • Glossary
    • Statistical Symbols
    • FAQs & Knowledge Base
    • Testimonials
    • Test Yourself
Menu
  • Courses
    • See All Courses
    • Calendar
    • Intro stats for college credit
    • Faculty
    • Group training
    • Credit & Credentialing
    • Teach With Us
  • Programs/Degrees
    • Certificates
      • Analytics for Data Science
      • Biostatistics
      • Programming For Data Science – Python (Experienced)
      • Programming For Data Science – Python (Novice)
      • Programming For Data Science – R (Experienced)
      • Programming For Data Science – R (Novice)
      • Social Science
    • Skillsets
      • Bayesian Statistics
      • Business Analytics
      • Healthcare Analytics
      • Marketing Analytics
      • Operations Research
      • Predictive Analytics
      • Python Analytics
      • R Programming Analytics
      • Rasch & IRT
      • Spatial Statistics
      • Survey Analysis
      • Text Mining Analytics
    • Undergraduate Degree Programs
    • Graduate Degree Programs
  • Partnerships
    • Higher Education
    • Enterprise
  • Resources
    • About Us
    • Blog
    • Word Of The Week
    • Newsletter signup
    • Glossary
    • Statistical Symbols
    • FAQs & Knowledge Base
    • Testimonials
    • Test Yourself
Student Login

Home Blog Word of the Week – Label Spreading

Word of the Week – Label Spreading

A common problem in machine learning is the “rare case” situation. In many classification problems, the class of interest (fraud, purchase by a web visitor, death of a patient) is rare enough that a data sample may not have enough instances to generate useful predictions. One way to deal with this problem is, in essence, data fabrication: attaching synthetic class labels to cases where we don’t know the actual label.

This is called label propagation or label spreading and sounds bogus. However, it has worked in test cases. The idea is as follows:

1. Start with a small number of cases where the label (class) is known. (We have only a small number of 1’s, the class of interest, as the class occurs only rarely).
2.Identify additional cases where the label is unknown but the case is very similar to the known 1’s in other respects.
3.Label those cases as 1’s.
4.Combine the real 1’s with the artificial 1’s and use it as the training data for a model.

Granted, a source of error is introduced: we are only guessing at the synthetic labels. Simulations, though, have shown that this can be more than offset by the reduction in another type of error: small sample error. Label spreading takes advantage of the information contained in the predictor values for the similar cases. It is analogous to imputing missing data, which also allows us to use more of the data in fitting a model.

Label spreading is typically applied to graph data; i.e., data that describe the links (edges) between cases (nodes) in a network. Nodes with unknown labels can take the label that predominates in the nearby network community.

Subscribe to blog

By submitting your information, you agree to receive email communications from statistics.com. All information submitted is subject to our privacy policy. You may opt out of receiving communications at any time.

Categories

Recent Posts

  • Oct 19: Data Literacy – The Chainsaw Case
    /
    0 Comments
  • Data Literacy – The Chainsaw Case
    /
    0 Comments
  • Word of the Week – Drift
    /
    0 Comments

About Statistics.com

Statistics.com offers academic and professional education in statistics, analytics, and data science at beginner, intermediate, and advanced levels of instruction. Statistics.com is a part of Elder Research, a data science consultancy with 25 years of experience in data analytics.

Our Links

  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team
Menu
  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team

Social Networks

Contact

The Institute for Statistics Education
2107 Wilson Blvd
Suite 850 
Arlington, VA 22201
(571) 281-8817

ourcourses@statistics.com

  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team

© Copyright 2022 - Statistics.com, LLC | All Rights Reserved | Privacy Policy | Terms of Use

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

Accept