For the most part, data transformations are used to allow us to apply linear model techniques to the data.
I have come to the following thought process on transformations for this purpose. There are four primary reasons for transformation, listed here in rough order of desirability from my point of view:
Scientific: It may be that the scientific theory—whether physical, chemical, or biological—has already mathematically described a relationship among the variables. In that case, it may be that by taking a transformation of the data that the relationship becomes linear.
Operational: In many settings, for some unknown reason or sometimes for a good reason, a data transformation will linearize the data quite well. For example, when data span magnitudes, most measurement systems will naturally measure larger variability around larger measurements—a good example of this is found in assay systems that use serial dilution.
Statistical: Now, in some cases, if you assume a probability model, you can find that some transformations do something interesting called “variance stabilization”. (Well, it’s interesting if you are a statistician, anyway.) The reason this is relevant is that that the linear model methods all pretty much require equal variances in each group (a slight simplification). The variance stabilizing transformation can create that situation.
Empirical: Finally, at the bottom of the barrel, you have the bright idea of “Let’s just transform the hell out of the data until we find a transformation that linearizes it!” There is a certain charm to this, but I am less convinced by the line of reasoning there. This is usually carried out via either a random “Let’s keep trying transformations till we find a good one.” approach and something dressed up more formally, like finding the Box-Cox transformation.
I rank the operational reason above the statistical on the grounds that there is something real happening to create the need for a data transformation. It is important to stay grounded in reality and avoid drifing into the nether worlds of statistical theory.
In fact, there is a pronounced tendency in statistics to elevate statistical theory to the level of reality. Perhaps there is some deep-seated psychological reason for this. We need not concern ourselves with this, but simply try to recognize that much of statistical critique is based on this idea of making assumptions about the real world that may or may not be warranted.
The Box-Cox transformation is to me an example of putting the statistical cart before the reality horse: It gives primary importance to supposed existence of normally distributed errors and transforms the data to suit. While one might perhaps make a Bayesian argument for this, in fact this is never done.