17.4.1.6 Algorithms (Three-Way ANOVA)

Theory of Three-Way ANOVA

Suppose N observations are associated with three factors, say, factor A with I levels, factor B with J levels and factor C with K levels.

Let y_{hijk}\,\! denotes the hth observation at level i of factor A, level j of factor B and level k of factor C, the three-way ANOVA model can be written as

y_{hijk}=\mu +\alpha _i+\beta _j+\gamma _k+(\alpha\beta)_{ij}+(\alpha\gamma)_{ik}+(\beta\gamma)_{jk}+(\alpha\beta\gamma)_{ijk}+\varepsilon _{hijk}

where \mu \,\! is the whole response data mean, \alpha _i\,\! is deviation at level i of factor A; \beta _j\,\! is the deviation at level j of factor B, \gamma _k\,\! is the deviation at level k of factor C,(\alpha\beta)_{ij}\,\! is interaction term between factors A and B,(\alpha\gamma)_{ij}\,\! is interaction term between factors A and C,(\beta\gamma)_{ij}\,\! is interaction term between factors B and C,(\alpha\beta\gamma)_{ijk}\,\! is the interaction term among factors A and B and C, and \varepsilon _{hijk}\,\! is the error term.

In three-way ANOVA, users can specify their model. For example, they can exclude the term (\alpha\beta)_{ij}\,\! (if so, then the term (\alpha\beta\gamma)_{ijk}\,\! is autonomously excluded at the same time), then their model would like this:

y_{hijk}=\mu +\alpha _i+\beta _j+\gamma _k+(\alpha\gamma)_{ik}+(\beta\gamma)_{jk}+\varepsilon _{hijk}

The sample variation of a specified model can be obtained through so-called "design matrix" method. Taking the full model for example, the brief procedure for this method is:

Degrees of Freedom (DF) for the whole model is df_{Model}=IJK-1. The whole design matrix is X := X_{N\times df_{Model}} = [X_\mu |X_A |X_B |X_C |X_{AB} |X_{AC} |X_{BC} |X_{ABC}], where X_\mu is the sub-design-matrix for \mu, which is usually constructed by all "1", and other sub-design-matrices for what their subscripts stand. Let X_{-*} denotes X by replacing the corresponding sub-design-matrix with zeros, for instance, X_{-AB} = [X_\mu |X_A |X_B |X_C |0 |X_{AC} |X_{BC} |X_{ABC}]

Define

R_0 = Y^T X_{\mu}(X_{\mu}^T X_{\mu})^{-1}X_{\mu}^T Y

R_\mu = Y^T Y

R_{Model} = Y^T X(X^T X)^{-1}X^T Y

R_A = Y^T X_{-A}(X_{-A}^T X_{-A})^{-1}X_{-A}^T Y

R_B = Y^T X_{-B}(X_{-B}^T X_{-B})^{-1}X_{-B}^T Y

R_C = Y^T X_{-C}(X_{-C}^T X_{-C})^{-1}X_{-C}^T Y

R_{AB} = Y^T X_{-AB}(X_{-AB}^T X_{-AB})^{-1}X_{-AB}^T Y

R_{AC} = Y^T X_{-AC}(X_{-AC}^T X_{-AC})^{-1}X_{-AC}^T Y

R_{BC} = Y^T X_{-BC}(X_{-BC}^T X_{-BC})^{-1}X_{-BC}^T Y

R_{ABC} = Y^T X_{-ABC}(X_{-ABC}^T X_{-ABC})^{-1}X_{-ABC}^T Y

Then the sum of squares error would be

SS_A = R_{Model}-R_A

SS_B = R_{Model}-R_B

SS_C = R_{Model}-R_C

SS_{AB} = R_{Model}-R_{AB}

SS_{AC} = R_{Model}-R_{AC}

SS_{BC} = R_{Model}-R_{BC}

SS_{ABC} = R_{Model}-R_{ABC}

SS_{Error} = R_{\mu}-R_{Model}

SS_{Total} = R_{\mu}-R_{0}


For full model, the ANOVA table is summarized as below:

Source of Variation Degrees of Freedom (DF) Sum of Squares (SS) Mean Square (MS) F Value Prob > F
Factor A I - 1 SS_A MS_A MS_A / MS_{Error} P\{F\geq F_{(I-1,df_e,\alpha )}\}
Factor B J - 1 SS_B MS_B MS_B / MS_{Error} P\{F\geq F_{(J-1,df_e,\alpha )}\}
Factor C K - 1 SS_C MS_C MS_C / MS_{Error} P\{F\geq F_{(K-1,df_e,\alpha )}\}
A*B (I- 1) (J - 1) SS_{AB} MS_{AB} MS_{AB} / MS_{Error} P\{F\geq F_{((I-1)(J-1),df_e,\alpha )}\}
A*C (I- 1) (K - 1) SS_{AC} MS_{AC} MS_{AC} / MS_{Error} P\{F\geq F_{((I-1)(K-1),df_e,\alpha )}\}
B*C (J- 1) (K - 1) SS_{BC} MS_{BC} MS_{BC} / MS_{Error} P\{F\geq F_{((J-1)(K-1),df_e,\alpha )}\}
A*B*C (I- 1) (J - 1)(K - 1) SS_{ABC} MS_{ABC} MS_{ABC} / MS_{Error} P\{F\geq F_{((I-1)(J-1)(K-1),df_e,\alpha )}\}
Error df_e=N-IJK SS_{Error} MS_{Error}
Total N - 1 SS_{Total}

Multiple Means Comparisons

There are various methods for multiple means comparison in Origin, and we use the ocstat_dlsm_mean_comparison() function to perform means comparisons.

Two types of multiple means comparison methods:

Single-step method. It creates simultaneous confidence intervals to show how the means differ, including Tukey-Kramer, Bonferroni, Dunn-Sidak, Fisher’s LSD and Scheffé mothods.

Stepwise method. Sequentially perform the hypothesis tests, including Holm-Bonferroni and Holm-Sidak tests

Power Analysis

The power analysis procedure calculates the actual power for the sample data, as well as the hypothetical power if additional sample sizes are specified.

The power of a three-way analysis of variance is a measurement of its sensitivity. Power is the probability that the ANOVA will detect differences in the population means when real differences exist. In terms of the null and alternative hypotheses, power is the probability that the test statistic F will be extreme enough to reject the null hypothesis when it should be rejected actually (i.e. given the null hypothesis is not true).

The Origin Three-Way ANOVA dialog can compute powers for the Factor A, B and C sources. If the specified intersect terms are selected, Origin also can compute power for them.

Power is defined by the equation:

power=1-probf(f,df,dfe,nc)\,\!

where f is the deviate from the non-central F-distribution with df and dfe degrees of freedom and nc = SS/MSE. SS is the sum of squares of the source A, B, C, A*B, A*C, B*C, or A*B*C, MSE is the mean square of the Errors, df is the degrees of freedom of the numerator, dfe is the degrees of freedom of the Errors. All values (SS, MSE, df, and dfe) are obtained from the ANOVA table. The value of probf( ) is obtained using the NAG function nag_prob_non_central_f_dist (g01gdc) . See the NAG documentation for more detailed information.

All the above is a brief algorithm outline of three-way analysis of variation, for more information about the detail mathematical deduction, please reference to the corresponding part of the user's manual.

Levene test for Homogeneity of Variances

We use the following statistics to do Levene test.

L = \frac{(N-k)\sum_{k}^{i=1}n_i(Z_i-Z)^2}{(k-1)\sum_{k}^{i=1}\sum_{n_i}^{j=1}(Z_{ij}-Z_i)^2}

where

N is the number of observation, k = IJK is the number of subgroups with n_i(i=1,...,k) observation.

Z_{ij} = |Y_{ij}-T_i|

T_i = \frac{1}{n_i}\sum_{n_i}^{j=1}Y_{ij}

Z_i = \frac{1}{n_i}\sum_{n_i}^{j=1}Z_{ij}

Z = \frac{1}{N}\sum_{k}^{i=1}Z_i

Then we can get the p-value, which is 1-F_{k-1,N-k}(L).