Skip to content

Week #8 – Homonyms department: Sample

We continue our effort to shed light on potentially confusing usage of terms in the different data science communities.

In statistics, a sample is a collection of observations or records.  It is often, but not always, randomly drawn.  In matrix form, the rows are records (subjects), columns are variables, and cell values are the values for a particular variable for a particular subject.  The sample is the matrix – a collection of rows with their values.

In machine learning and artificial intelligence, a sample might refer to the above, but it also might refer to a single record (row).