2.2.3.3.2 Algorithm for Box-Cox Transformation

Box-Cox Transformation

Box-Cox transformation is one kinds of power transformation, and it only works for positive data. The resulting of Box-Cox transformation is formulated as follows:

$Y'=\left\{ \begin{array}{ll} Y^\lambda&\lambda \neq 0\cr \ln(Y)&\lambda = 0 \end{array} \right.$

Here $\lambda$ is in the range of $[-5, 5]$ .

Optimal $\lambda$

Origin estimates the optimal $\lambda$ in the range of $[-5, 5]$ , and the optimal $\lambda$ should get the minimal standard deviation of the transformed data. To eliminate the effect of different $\lambda$ for the standard deviation comparison, before calculating the standard deviation, standarizing the transformed data is needed. The following formula is used for the data standarization.

$Z_i=\left\{ \begin{array}{ll} \frac{Y_i^\lambda -1}{\lambda G^{\lambda-1}}&\lambda \neq 0\cr G \ln(Y_i)&\lambda = 0 \end{array} \right.$

where $i$ is for the $ith$ data, $G$ is the geometric mean of the original data. Then $Z$ is used for the standard deviation calculation.

The detailed steps of the optimization (also called golden section search algorithm) are:

Initialize the range for the optimization, here is from -5 to 5, and the tolerance for stopping the iteration.
Narrow down the range by the golden ratio, that is
$GoldenRatio=(\sqrt{5}+1)/2$

$LenghOfOldRange=OldLargeEndPoint-OldSmallEndPoint$

$NewSmallEndPoint=OldSmallEndPoint+LenghOfOldRange/GoldenRatio$

$NewLargeEndPoint=OldLargeEndPoint-LenghOfOldRange/GoldenRatio$

then get a smaller new range.
Take the end points of the new range as two $\lambda$ , and calculate $Z$ values, and then standard deviation.
Compare two standard deviations.
If the standard deviation of the small end point of the new range is bigger than the one of the large end point of the new range, update the range as from the small end point of the old range to the small end point of the new range.

Otherwise, update the range as from the large end point of the new range to the large end point of the old range.
Take the updated range in 4 as the old range, repeat 2 to 4 util the old range's length is smaller than the spcified tolerance, then get this old range as the final range.
The middle point of the final range is considered as the optimal $\lambda$ .

How to calculate standard deviation?

For subgroup data, that is, subgroup size is bigger than 1, the unbiased pooled standard deviation is estimated.
For individuals data, that is, subgroup size is 1, the average of moving range is estimated by moving range of 2.

Origin also provide the option if to round the optimal $\lambda$ to 0.5, that is to say, after getting the optimal $\lambda$ , round it to the closest value, which is the multiple times of 0.5.

Skip Navigation Links

All Books

Apps

Data Analysis

Time Series Analysis (Pro)

Transform and Decompose

Box-Cox Transformation

English | Deutsch | 日本語

© OriginLab Corporation. All rights reserved. Site Map \| Privacy Policy \| Terms of Use

× ☐ _ Let's Chat