The statistician John Tukey is regarded by some as the father, or at least one of the fathers, of data science. Before Tukey, statistics meant inference (p-values, ANOVA, etc.) and models. Tukey brought to the discipline a whole new perspective: exploring the data to see what it is telling us. He coined the term “data analysis,” in many ways a precursor to data science.
Among Tukey’s inventions are the box plot and the methods of exploratory data analysis (the title of his 1977 book). He made significant contributions to fast fourier transforms (FFT’s), and to the jackknife (precursor to the bootstrap and to cross-validation). Tukey is credited with inventing the terms “bit,” and, less definitively, “software.”
Tukey also made a valuable contribution to statistics by reorienting it away from mathematics and back towards empiricism. It is worth remembering that the earliest statisticians were experimenters and empiricists. The complex superstructure of mathematical approximations to sampling distributions was made necessary by the fact that computers were not available to produce these distributions by brute (but accurate) force.
“We need to give up the vain hope that data analysis can be founded upon a logico-deductive system like Euclidean plane geometry (or some form of the propositional calculus) and to face up to the fact that data analysis is intrinsically an empirical science.” – John Tukey