Skip to content

Explore Courses | Elder Research | Contact | LMS Login

Statistics.com
  • Curriculum
    • Curriculum
    • About Us
    • Testimonials
    • Management Team
    • Faculty Search
    • Teach With Us
    • Credit & Credentialing
  • Courses
    • Explore Courses
    • Course Calendar
    • About Our Courses
    • Course Tour
    • Test Yourself!
  • Mastery Series
    • Mastery Series Program
    • Bayesian Statistics
    • Business Analytics
    • Healthcare Analytics
    • Marketing Analytics
    • Operations Research
    • Predictive Analytics
    • Python for Analytics
    • R Programming
    • Rasch & IRT
    • Spatial Statistics
    • Statistical Modeling
    • Survey Statistics
    • Text Mining and Analytics
  • Certificates
    • Certificate Program
    • Analytics for Data Science
    • Biostatistics
    • Programming for Data Science – R (Novice)
    • Programming for Data Science – R (Experienced)
    • Programming for Data Science – Python (Novice)
    • Programming for Data Science – Python (Experienced)
    • Social Science
  • Degrees
    • Degree Programs
    • Computational Data Analytics Certificate of Graduate Study from Rowan University
    • Health Data Management Certificate of Graduate Study from Rowan University
    • Data Science Analytics Master’s Degree from Thomas Edison State University (TESU)
    • Data Science Analytics Bachelor’s Degree – TESU
    • Mathematics with Predictive Modeling Emphasis BS from Bellevue University
  • Enterprise
    • Organizations
    • Higher Education
  • Resources
    • Blog
    • FAQs & Knowledge Base
    • Glossary
    • Site Map
    • Statistical Symbols
    • Weekly Brief Newsletter Signup
    • Word of the Week
Menu Close
  • Curriculum
    • Curriculum
    • About Us
    • Testimonials
    • Management Team
    • Faculty Search
    • Teach With Us
    • Credit & Credentialing
  • Courses
    • Explore Courses
    • Course Calendar
    • About Our Courses
    • Course Tour
    • Test Yourself!
  • Mastery Series
    • Mastery Series Program
    • Bayesian Statistics
    • Business Analytics
    • Healthcare Analytics
    • Marketing Analytics
    • Operations Research
    • Predictive Analytics
    • Python for Analytics
    • R Programming
    • Rasch & IRT
    • Spatial Statistics
    • Statistical Modeling
    • Survey Statistics
    • Text Mining and Analytics
  • Certificates
    • Certificate Program
    • Analytics for Data Science
    • Biostatistics
    • Programming for Data Science – R (Novice)
    • Programming for Data Science – R (Experienced)
    • Programming for Data Science – Python (Novice)
    • Programming for Data Science – Python (Experienced)
    • Social Science
  • Degrees
    • Degree Programs
    • Computational Data Analytics Certificate of Graduate Study from Rowan University
    • Health Data Management Certificate of Graduate Study from Rowan University
    • Data Science Analytics Master’s Degree from Thomas Edison State University (TESU)
    • Data Science Analytics Bachelor’s Degree – TESU
    • Mathematics with Predictive Modeling Emphasis BS from Bellevue University
  • Enterprise
    • Organizations
    • Higher Education
  • Resources
    • Blog
    • FAQs & Knowledge Base
    • Glossary
    • Site Map
    • Statistical Symbols
    • Weekly Brief Newsletter Signup
    • Word of the Week

Blog

Confusing Terms in Data Science – A Look at Homonyms and more

  • April 8, 2019
  • , 6:18 pm

To a statistician, a sample is a collection of observations (cases).  To a machine learner, it’s a single observation.  Modern data science has its origin in several different fields, which leads to potentially confusing homonyms like these: 

 

 

Homonyms (words with multiple meanings):

Bias:  To a lay person, bias refers to an opinion about something that is pre-formed in advance of specific facts.  As consideration of ethical issues in data science grows, this meaning has crept into discussion of the fairness or social worth of machine learning algorithms.  But the term has a more narrow definition in statistics – it refers to the tendency of an estimation procedure, or a model, to arrive at estimates or predictions that are, on balance, off target.

Confidence:  To a statistician, confidence measures sample reliability (we are 95% confident that the average blood sugar in the group lies between X and Y, based on a sample of N patients).  To a machine learner, confidence can refer to a metric used in association rules (“what goes with what in market basket transactions”), one of several measures of the strength of a rule.

Decision Trees:   To statisticians and machine learners, “decision trees,” also called “classification and regression trees” (CART), is a term for a class of algorithms that progressively partition data into chunks that are more and more homogeneous with respect to the outcome variable.  The result is a branching set of rules applied to predictor variables to predict the outcome. To an operations research specialist, “decision trees” are a representation of progressive decisions and possible outcomes, with probabilities, plus costs/benefits, attached to the outcomes.  The path ending in the highest expected value then guides decisions.

Graph:  To a lay person, a graph usually means a visual representation of data, which statisticians more often refer to as plots and charts.  To computer scientist, graph refers to a data structure of entities? ties and links between them. Speaking of graphs, Wikipedia has an interesting Venn-style diagram of homonyms, synonyms, homographs and their cousins (right).

Normalize:  In statistics and machine learning, to normalize a variable is to rescale it, so that it is on the same scale as other variables to be used in a model.  For example, to subtract the mean, so it is centered around 0, and to divide by the standard deviation, so that it has a consistent scale with other variables so normalized.  In database management, normalization refers to the process of organizing relational databases and their tables so that the data are not redundant and relations among tables are consistent.

Sample:  In statistics, a sample is a collection of observations or records.  In computer science and machine learning, sample often refers to a single record.

Subscribe to the Blog

You have Successfully Subscribed!

By submitting your information, you agree to receive email communications from statistics.com. All information submitted is subject to our privacy policy. You may opt out of receiving communications at any time.

Categories

Recent Posts

  • Making Predictions Self-Fulfilling Prophecies February 19, 2021
  • Student Spotlight – Staci Taylor February 18, 2021
  • Word of the Week:  Bias February 1, 2021

About Statistics.com

Statistics.com offers academic and professional education in statistics, analytics, and data science at beginner, intermediate, and advanced levels of instruction. Statistics.com is a part of Elder Research, a data science consultancy with 25 years of experience in data analytics.

Latest Blogs

  • Making Predictions Self-Fulfilling Prophecies
    February 19, 2021/
    0 Comments
  • Student Spotlight – Staci Taylor
    February 18, 2021/
    0 Comments
  • Word of the Week:  Bias
    February 1, 2021/
    0 Comments

Social Networks

Linkedin
Twitter
Facebook
Youtube

Contact

The Institute for Statistics Education
4075 Wilson Blvd, 8th Floor
Arlington, VA 22203
(571) 281-8817

ourcourses@statistics.com

© Copyright 2021 - Statistics.com, LLC | All Rights Reserved | Privacy Policy | Terms of Use

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

Accept