Skip to content

Week #7 – Homonyms department: Normalization

With this entry, we inaugurate a new effort to shed light on potentially confusing usage of terms in the different data science communities.

In statistics and machine learning, normalization of variables means to subtract the mean and divide by the standard deviation.  When there are multiple variables in an analysis, normalization (also called standardization) removes scale as a factor.  For example, it would ensure that the analysis does not change if a particular distance were measured in feet instead of miles.

In the database community, normalization refers to the process of organizing data into a relational database with tables that key to each other, and minimize redundancy.