2.2.3.3.2 Algorithm for Box-Cox Transformation

Box-Cox Transformation

Box-Cox transformation is one kinds of power transformation, and it only works for positive data. The resulting of Box-Cox transformation is formulated as follows:


Y'=\left\{
    \begin{array}{ll}
    Y^\lambda&\lambda \neq 0\cr
    \ln(Y)&\lambda = 0
    \end{array}
\right.

Here \lambda is in the range of [-5, 5].

Optimal \lambda

Origin estimates the optimal \lambda in the range of [-5, 5], and the optimal \lambda should get the minimal standard deviation of the transformed data. To eliminate the effect of different \lambda for the standard deviation comparison, before calculating the standard deviation, standarizing the transformed data is needed. The following formula is used for the data standarization.


Z_i=\left\{
    \begin{array}{ll}
\frac{Y_i^\lambda -1}{\lambda G^{\lambda-1}}&\lambda \neq 0\cr
G \ln(Y_i)&\lambda = 0
    \end{array}
\right.

where i is for the ith data, G is the geometric mean of the original data. Then Z is used for the standard deviation calculation.

The detailed steps of the optimization (also called golden section search algorithm) are:

  1. Initialize the range for the optimization, here is from -5 to 5, and the tolerance for stopping the iteration.
  2. Narrow down the range by the golden ratio, that is
    GoldenRatio=(\sqrt{5}+1)/2
    LenghOfOldRange=OldLargeEndPoint-OldSmallEndPoint
    NewSmallEndPoint=OldSmallEndPoint+LenghOfOldRange/GoldenRatio
    NewLargeEndPoint=OldLargeEndPoint-LenghOfOldRange/GoldenRatio
    then get a smaller new range.
  3. Take the end points of the new range as two \lambda, and calculate Z values, and then standard deviation.
  4. Compare two standard deviations.
    If the standard deviation of the small end point of the new range is bigger than the one of the large end point of the new range, update the range as from the small end point of the old range to the small end point of the new range.
    Otherwise, update the range as from the large end point of the new range to the large end point of the old range.
  5. Take the updated range in 4 as the old range, repeat 2 to 4 util the old range's length is smaller than the spcified tolerance, then get this old range as the final range.
  6. The middle point of the final range is considered as the optimal \lambda.

How to calculate standard deviation?

  1. For subgroup data, that is, subgroup size is bigger than 1, the unbiased pooled standard deviation is estimated.
  2. For individuals data, that is, subgroup size is 1, the average of moving range is estimated by moving range of 2.

Origin also provide the option if to round the optimal \lambda to 0.5, that is to say, after getting the optimal \lambda, round it to the closest value, which is the multiple times of 0.5.