Skip to content

Explore Courses | Elder Research | Contact | LMS Login

Statistics.com Logo
  • Courses
    • See All Courses
    • Calendar
    • Intro stats for college credit
    • Faculty
    • Group training
    • Credit & Credentialing
    • Teach With Us
  • Programs/Degrees
    • Mentorship
    • Certificates
      • Analytics for Data Science
      • Biostatistics
      • Programming For Data Science – Python (Experienced)
      • Programming For Data Science – Python (Novice)
      • Programming For Data Science – R (Experienced)
      • Programming For Data Science – R (Novice)
      • Social Science
    • Undergraduate Degree Programs
    • Graduate Degree Programs
    • Massive Open Online Courses (MOOC)
  • Partnerships
    • Higher Education
    • Enterprise
  • Resources
    • About Us
    • Blog
    • Word Of The Week
    • News and Announcements
    • Newsletter signup
    • Glossary
    • Statistical Symbols
    • FAQs & Knowledge Base
    • Testimonials
    • Test Yourself
Menu
  • Courses
    • See All Courses
    • Calendar
    • Intro stats for college credit
    • Faculty
    • Group training
    • Credit & Credentialing
    • Teach With Us
  • Programs/Degrees
    • Mentorship
    • Certificates
      • Analytics for Data Science
      • Biostatistics
      • Programming For Data Science – Python (Experienced)
      • Programming For Data Science – Python (Novice)
      • Programming For Data Science – R (Experienced)
      • Programming For Data Science – R (Novice)
      • Social Science
    • Undergraduate Degree Programs
    • Graduate Degree Programs
    • Massive Open Online Courses (MOOC)
  • Partnerships
    • Higher Education
    • Enterprise
  • Resources
    • About Us
    • Blog
    • Word Of The Week
    • News and Announcements
    • Newsletter signup
    • Glossary
    • Statistical Symbols
    • FAQs & Knowledge Base
    • Testimonials
    • Test Yourself
Student Login

Blog

Home Blog Words of the Week – Inference and Confidence

Words of the Week – Inference and Confidence

An often-overlooked basic part of learning new things is vocabulary: if you don’t fully understand the meaning of terms, you are handicapped. Worse, if you think you do understand, but that understanding is wrong, you are deprived of the ability to identify the gap in your understanding. This can happen in data science, where different communities (statisticians, IT engineers, computer scientists) may have different meanings for the same word. A couple of weeks ago, we looked at multiple meanings of the term bias. This week we look at two more: inference and confidence.

Inference
In machine learning, inference refers to the process of operationalizing a trained model by applying it to new data and making predictions. This process is also called scoring, and the output (the prediction) is the score. Inference is the final phase of modeling and does not incorporate the earlier processes of training different models, assessing their performance, selecting the best model and tuning its parameters.

In statistics, inference is something different and more complex: it is the process of (1) estimating quantities of interest in a population by using sample from that population, then (2) quantifying the uncertainty around these population estimates, specifically uncertainty caused by random variation. The variation can occur in sampling, or in assignment of subjects to a treatment, or both. Inferential statistics is a full branch of statistics, incorporating an initial phase of making estimates on the basis of samples, and a second phase of quantifying possible uncertainty from random variation, using confidence intervals and p-values. Using resampling (bootstrapping and permutation), quantifying uncertainty is fairly straightforward. Classical (pre-computer) statistics has built a complex and (to those new to statistics) intimidating structure, much of which remains in place due to the force of inertia.

Confidence
In machine learning, the term confidence often refers to the estimated probability of an event or item of interest. For example, some predictive learning algorithms report output might say, e.g., “the confidence [i.e. estimated probability] is 80% that this record belongs to class A. But confidence is often used to refer to a conditional prevalence. For example, in association rules for transactions (used in affinity analysis and recommender systems), confidence for a rule like “if A is purchased, so is B” quantifies the proportion of transactions with A that also include B. One of several metrics that measure the power of transaction rules, it is based on actual counts of items purchased, though you will see the terminology of probability (P B|A, the probability of B given A) in software output.
In statistics, the term confidence is used primarily in relation to the concept of a confidence interval, which is a range that encloses a measurement or estimate. It reflects the uncertainty in the estimate due to sampling error. For example, after a random survey of Twitter users, you might say that their average age, which was 34 in the survey, lies between 31 and 37, with 90% confidence. Technically, this means that 90% of the samples drawn from a population that is well represented by the original sample and has a mean of 34 will have a sample mean that lies between 31 and 37. This convoluted definition is of limited practical value, so most people interpret the result as “the probability is 90% the average age of Twitter users is between 31 and 37.” To paraphrase George Box (“all models are wrong, but some are useful”), this interpretation is not strictly correct but it is useful.

Recent Posts

  • Oct 6: Ethical AI: Darth Vader and the Cowardly Lion
    /
    0 Comments
  • Oct 19: Data Literacy – The Chainsaw Case
    /
    0 Comments
  • Data Literacy – The Chainsaw Case
    /
    0 Comments

About Statistics.com

Statistics.com offers academic and professional education in statistics, analytics, and data science at beginner, intermediate, and advanced levels of instruction. Statistics.com is a part of Elder Research, a data science consultancy with 25 years of experience in data analytics.

 The Institute for Statistics Education is certified to operate by the State Council of Higher Education for Virginia (SCHEV)

Our Links

  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team
  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team

Social Networks

Facebook Twitter Youtube Linkedin

Contact

The Institute for Statistics Education
2107 Wilson Blvd
Suite 850 
Arlington, VA 22201
(571) 281-8817

ourcourses@statistics.com

  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team

© Copyright 2023 - Statistics.com | All Rights Reserved | Privacy Policy | Terms of Use

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

Accept