Transformation is the conversion of a data set into a transformed data set by the application of a function. The statistical purpose of transformation is to produce a transformed data set that better conforms to the requirements of a statistical procedure. A typical use of transformation is to take the log of each value; this reduces the long right tail in a skewed distribution and produces a more normally-shaped distribution.
Note that the properties of the distribution can change during transformation in ways that might invalidate the analysis. For example, consider these data where the relative magnitude of the sample mean switches after transformation:
A: 15.5, 25.4, 10.5, 13.8 Mean: 16.3
B: 15.5, 13.2, 15.3, 18.4 Mean: 15.6
Sample A has a larger mean than sample B.
After transforming using the natural log:
A: 2.741, 3.235, 2.351, 2.625 Mean: 2.738
B: 2.741, 2.580, 2.728, 2.912 Mean: 2.740
Sample B has a larger mean than sample A.
An alternative to transformation is to use non-parametric techniques that do not depend on the data being distributed in a certain fashion. See permutation tests and bootstrap.
(Thanks to David Parkhurst, Environmental Science Research Center, Indiana University in Bloomington for the example.)