As variables are added, the data space becomes increasingly sparse, and classification and prediction models fail because the available data are insufficient to provide a useful model across so many variables. An important consideration is the fact that the difficulties posed by adding a variable increase exponentially with the addition of each variable. One way to think of this intuitively is to consider the location of an object on a chessboard. It has two 2 dimensions and 64 squares or choices. If you expand the chessboard to a cube, you increase the dimensions by 50% – from 2 dimensions to 3 dimensions. However, the location options increase by 800%, to 512 (8 x 8 x 8). The problem is particularly acute in Big Data applications, including genomics, where, for example, an analysis might have to deal with values for thousands of different genes.
Week #14 – Curse of Dimensionality
The curse of dimensionality is the affliction caused by adding variables to multivariate data models.