17.4.1.2 Algorithms (One-Way ANOVA)OneWayANOVA-Algorithm
Theory of One-Way ANOVA
Assume we have response data measured in k levels of the factor, where represents the value of ith observation (i = 1, 2, ...) on the jth factor level (j = 1, 2, ..., k). Then we could write the model of one-way ANOVA as:
,j = 1,2, ..., k; i = 1, 2, ...
Since ANOVA testing whether the mean of two or more populations (levels) are equal. Thus, the null hypothesis is that the means of the different populations are the same and the alternate hypothesis is at least one psample's mean is different from the others. Mathematically, this is expressed as:
H0:
H1: for some p and q, , .
where is the jth sample mean. To test the hypothesis, it should be divide the total sample variation into variation between groups and variation within groups, and then using the F-test to test whether these two variations are different.
Algebraically, we can use the respective mean square of each part to estimate the variation:
where the left term is called the "total sum of squares", the second term is called the "sum of squares of treatments", which represents the variation between groups, and the third term is called "sum of squares of error", which represent the variation within groups. The equation is then commonly abbreviated to
When is true, the k levels sample data will be normally and independently distributed, with mean and variance . Thus the statistic
will follow an F distribution where is the mean squares for treatments and is the mean squares for error, which are both formed by dividing the sum of squares by the associated degrees of freedom respectively. Given a certain significance level , if the F statistic exceeds the critical value which is the tabular value of the F distribution with k-1 and n-k degrees of freedom at level , or equivalently, the followed P value less than the significance level, the null hypothesis should be rejected.
Typically, it is common to present the results of the analysis of variance in an ANOVA table:
Source of Variation
|
Degrees of Freedom (DF)
|
Sum of Squares (SS)
|
Mean Square (MS)
|
F Value
|
Prob > F
|
Model (Factor)
|
k-1
|
|
|
/
|
|
Error
|
n-k
|
|
|
|
|
Total
|
n-1
|
|
|
|
|
Homogeneity of Variance
In the analysis of variance, it is assumed that different samples have equal variances, which is commonly called homogeneity of variance. The Levene test and Brown-Forsythe test can be used to verify the assumption. Suppose we have k samples of response data, where represents the value of ith observation (i = 1, 2, ...) on the jth factor level (j = 1, 2, ..., k). The hypotheses of both Levene test and Brown-Forsythe test can be expressed as:
:
: , for at least one pair (p, q),
Define as the following three definitions according to different tests,
- Absolute Levene test:
- Squared Levene test:
- Brown-Forsythe test:
When holds, the test statistic
will (approximately) follow an F distribution where and are the group mean of and the overall mean of the respectively.
Multiple Means Comparisons
Given that an ANOVA experiment has determined that at least one of the population means is significantly different, multiple means comparison subsequently compares all possible pairs of factor level means to determine which mean (or means) is (or are) significantly different. There are various methods for mean comparison in Origin, and we use the NAG function nag_anova_confid_interval (g04dbc) to perform means comparisons.
Two types of multiple means comparison methods are included in Origin:
- Single-step method. It creates simultaneous confidence intervals to show how the means differ, including Tukey-Kramer, Bonferroni, Dunn-Sidak, Fisher's LSD, and Scheffe.
- Stepwise method. Sequentially perform the hypothesis tests, including Holm-Bonferroni and Holm-Sidak tests.
Power Analysis
The power analysis procedure calculates the actual power for the sample data, as well as the hypothetical power if additional sample sizes are specified.
The power of a one-way analysis of variance is a measurement of its sensitivity. Power is the probability that the one-way ANOVA will detect differences in the sample means when real differences exist. In terms of the null and alternative hypotheses, power is the probability that the test statistic F will be extreme enough to reject the null hypothesis when it should be rejected actually (i.e. given the null hypothesis is not true).
Power is defined by the equation:
where f is the deviate from the non-central F-distribution with dfa and dfe, model and error degrees of freedom, respectively. And nc = SST/MSE, where SST is the sum of squares of the Model, and MSE is the mean square of the Errors. The value of probf( ) is obtained using the NAG function nag_prob_non_central_f_dist (g01gdc). Please see the NAG documentation for more detailed information.
All the above is a brief algorithm outline of one-way analysis of variation, for more information about the detail mathematical deduction, please reference to the corresponding part of the user's manual and NAG document.
|