Skip to content

Explore Courses | Elder Research | Contact | LMS Login

Statistics.com
  • Curriculum
    • Curriculum
    • About Us
    • Testimonials
    • Management Team
    • Faculty Search
    • Teach With Us
    • Credit & Credentialing
  • Courses
    • Explore Courses
    • Course Calendar
    • About Our Courses
    • Course Tour
    • Test Yourself!
  • Mastery Series
    • Mastery Series Program
    • Bayesian Statistics
    • Business Analytics
    • Healthcare Analytics
    • Marketing Analytics
    • Operations Research
    • Predictive Analytics
    • Python for Analytics
    • R Programming
    • Rasch & IRT
    • Spatial Statistics
    • Statistical Modeling
    • Survey Statistics
    • Text Mining and Analytics
  • Certificates
    • Certificate Program
    • Analytics for Data Science
    • Biostatistics
    • Programming for Data Science – R (Novice)
    • Programming for Data Science – R (Experienced)
    • Programming for Data Science – Python (Novice)
    • Programming for Data Science – Python (Experienced)
    • Social Science
  • Degrees
    • Degree Programs
    • Computational Data Analytics Certificate of Graduate Study from Rowan University
    • Health Data Management Certificate of Graduate Study from Rowan University
    • Data Science Analytics Master’s Degree from Thomas Edison State University (TESU)
    • Data Science Analytics Bachelor’s Degree – TESU
    • Mathematics with Predictive Modeling Emphasis BS from Bellevue University
  • Enterprise
    • Organizations
    • Higher Education
  • Resources
    • Blog
    • FAQs & Knowledge Base
    • Glossary
    • Site Map
    • Statistical Symbols
    • Weekly Brief Newsletter Signup
    • Word of the Week
Menu Close
  • Curriculum
    • Curriculum
    • About Us
    • Testimonials
    • Management Team
    • Faculty Search
    • Teach With Us
    • Credit & Credentialing
  • Courses
    • Explore Courses
    • Course Calendar
    • About Our Courses
    • Course Tour
    • Test Yourself!
  • Mastery Series
    • Mastery Series Program
    • Bayesian Statistics
    • Business Analytics
    • Healthcare Analytics
    • Marketing Analytics
    • Operations Research
    • Predictive Analytics
    • Python for Analytics
    • R Programming
    • Rasch & IRT
    • Spatial Statistics
    • Statistical Modeling
    • Survey Statistics
    • Text Mining and Analytics
  • Certificates
    • Certificate Program
    • Analytics for Data Science
    • Biostatistics
    • Programming for Data Science – R (Novice)
    • Programming for Data Science – R (Experienced)
    • Programming for Data Science – Python (Novice)
    • Programming for Data Science – Python (Experienced)
    • Social Science
  • Degrees
    • Degree Programs
    • Computational Data Analytics Certificate of Graduate Study from Rowan University
    • Health Data Management Certificate of Graduate Study from Rowan University
    • Data Science Analytics Master’s Degree from Thomas Edison State University (TESU)
    • Data Science Analytics Bachelor’s Degree – TESU
    • Mathematics with Predictive Modeling Emphasis BS from Bellevue University
  • Enterprise
    • Organizations
    • Higher Education
  • Resources
    • Blog
    • FAQs & Knowledge Base
    • Glossary
    • Site Map
    • Statistical Symbols
    • Weekly Brief Newsletter Signup
    • Word of the Week

Blog

Home » Blog » Data Science » Machine Learning » “Defiant” Supervision

“Defiant” Supervision

  • March 1, 2019
  • , 10:50 pm

How did the phrase “defiantly recommend”, as in “I defiantly recommend this product,” come into common usage on the internet? The answer is a good look inside the workings of supervised learning.

Supervision, generally from humans, is instrumental in much of statistical and machine learning. Google’s precise search algorithms are not public, but the general approach is to return to the user a set of links that are statistically “close” to the search string. A similar approach is used in spell-check. If the user types something that is not in the dictionary, the spell-checker provides legitimate dictionary words that are close to the misspelled term.

Big Data Enables Supervision

Next comes the supervision part. Users choose the link or the word, that matches what they are looking for. Again and again – Google processes over 40,000 such queries per second. Over time, for each search string, and each misspelling, Google observes which of its suggestions receives the most votes, and moves it to the top of the list.

“Defiant” recommendations arose out of an initial misspelling – users meant “definitely recommend” but some typed “definatly recommend.” Recognizing that “definatly” was not a correct spelling, Google suggested alternatives. In early days, without any supervision to go on, Google listed “defiantly” ahead of “definitely” because it was closer to “definatly” in its spelling – it’s the same set of letters, with two needing swapping. “Definitely” needs more changes – lose the “a”, and add an “e” and “i” – so early on it was listed second.

If supervision is working properly, the users would correct the error by choosing the correct spelling. Unfortunately, the early human supervisors were lazy. They simply OK’d the first option on offer – “defiantly” – and Google learned that this was the proper correction for “definatly.”

Supervised learning, in its first appearances, was a modified form of traditional statistical modeling, in which a model is fit to a set of data in order to describe and, hopefully, explain the relationship between predictor variables and an outcome. Supervised learning adds two elements:

  • The primary purpose becomes predicting outcomes for new records

  • The model is validated and adjusted using out-of-sample data (a sample withheld from the original model fit)

Predicting credit scores using logistic regression was an early (and continuing) application of supervised learning. The same paradigm was applied to data-centric algorithms that do not impose a linear or other structural models, such as

  • Nearest neighbor algorithms (label new cases as other, similar, cases are labeled)

  • Tree algorithms (repeatedly split the data according to predictors that do a good job of separating the outcomes)

  • Naive Bayes (identify the most probable outcome, given the predictors)

  • Neural nets (repeatedly pass the cases through a set of weights applied to predictors, iteratively adjusting the weights to improve predictions)

 

The term “artificial intelligence” has somewhat eclipsed “machine learning” in the popular imagination but supervised learning remains a core function in data science

Subscribe to the Blog

You have Successfully Subscribed!

Categories

Recent Posts

  • Dec 14: Statistics in Practice December 11, 2020
  • PUZZLE OF THE WEEK – School in the Pandemic December 11, 2020
  • From Kaggle to Cancel: The Culture of AI December 11, 2020

About Statistics.com

Statistics.com offers academic and professional education in statistics, analytics, and data science at beginner, intermediate, and advanced levels of instruction. Statistics.com is a part of Elder Research, a data science consultancy with 25 years of experience in data analytics.

Latest Blogs

  • Dec 14: Statistics in Practice
    December 11, 2020/
    0 Comments
  • PUZZLE OF THE WEEK – School in the Pandemic
    December 11, 2020/
    0 Comments
  • From Kaggle to Cancel: The Culture of AI
    December 11, 2020/
    0 Comments

Social Networks

Linkedin
Twitter
Facebook
Youtube

Contact

The Institute for Statistics Education
4075 Wilson Blvd, 8th Floor
Arlington, VA 22203
(571) 281-8817

ourcourses@statistics.com

© Copyright 2021 - Statistics.com, LLC | All Rights Reserved | Privacy Policy | Terms of Use

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

Accept