17.1.8.3 Choosing Normality Tests and Interpreting Results

Summary

Suppose you want to investigate the health status of some students. You collect 40 students and record their names, genders, ages, heights and weights. After collecting your data, you use a Normality Test procedure to examine whether the weights of the students follow a normal distribution.

Example

  1. Import body.dat in the \Samples\Statistics folder.
  2. Highlight Column E
  3. Select Statistics: Descriptive Statistics: Normality Test.
  4. Check all tests under the Quantities to Compute branch.
  5. Select Histograms and Box Chart under the Plots branch.
  6. Click OK.

Interpreting Results

Statistical models usually depend on some underlying assumptions. One common assumption is that of a normally-distributed population. Unfortunately, many analysts assume normality without any empirical evidence or test. If the assumption of normality is violated, then what we infer might not be reliable.

It is difficult to define a standard for interpreting the results of normality tests because needs vary by discipline and by analyst. Some test methods may be satisfactory for one field but unsatisfactory in another field.

There are two primary approaches to normality testing: graphical methods and numerical methods. Graphical methods tend to be intuitive and easy to interpret. Numerical methods are more precise and hence, more objective.

Graphical Methods

Stem-and-leaf plots, (skeletal) box plots, dot plots, histograms, and P-P or Q-Q plots, are useful for visualizing the difference between an empirical distribution and a theoretical normal distribution. Origin's Normality Test tool offers histograms and box charts, but it should be mentioned that Origin also offers P-P and Q-Q plots from the Plot menu.

One very straightforward way to "test" for normality is to create a histogram. It is well known that the shape of a normal distribution is symmetrical and classically "bell-shaped." Looking at a histogram, we can we can obtain a rough idea as to the nature of the population distribution. The box chart is something of a complement to the histogram. A box chart effectively summarizes major percentiles, such as minimum, 25th percentile (1st quartile), 50th percentile (median), 75th percentile (3rd quartile) and maximum, using a box and lines. If the 25th and 75th percentiles are symmetrical with respect to the median, and median and mean values are seen to be located at roughly the same position near the center of the box, then we have reason to believe that the variable of interest may be normally distributed. In the body.dat example above, the shape of histogram is not exactly symmetrical, but it is near to a "bell" shape. The box chart of the same data also indicates rough symmetry.

Choosing Normality Test Methods

It is well known that measures of skewness and kurtosis can be applied to normality testing. Skewness is generally defined to be a third standardized moment, a measure of the degree of symmetry. If skewness is greater than zero, the distribution is right-skewed and we count more observations on the left side of the distribution curve; conversely, when skewness is less than 0, observations will be distributed to the right side of the curve. Kurtosis, a fourth standardized moment, measures peak expression or thinness of tails. Note that the standard normal distribution has kurtosis = 0 (in "excess kutosis" definition convention). So, if a calculated kurtosis > 0, the distribution has thinner tails and a higher peak as compared with the standard normal distribution. Origin calculates both skewness and kurtosis. See Statistics on Columns for details.

The Kolmogorov-Smirnov, Kolmogorov-Smirnov-Lilliefors, Anderson-Darling and Cramer-von Mises tests are empirical distribution function (EDF) based methods, while Jarque–Bera and Skewness-Kurtosis (aka D'Agostino K-Squared) tests are Chi-squared distribution based. The Chen-Shapiro test is a normalized spacing-based method found to be both powerful and simple. Shapiro-Wilk, Ryan-Joiner and Shapiro-Francia tests, like Chen-Shapiro, are regression- and correlation-based methods.

  1. Kolmogorov-Smirnov: The K-S test, though known to be less powerful, is widely used. Generally, it requires large sample sizes.
  2. Kolmogorov-Smirnov-Lilliefors: An adaptation of the K-S test. More complicated than K-S, since it must be established whether the maximum discrepancy between empirical distribution function and the cumulative distribution function is large enough to be statistically significant. K-S-L is generally recommended over K-S. Some analysts recommend that the sample size of K-S-L be larger than 2000.
  3. Anderson-Darling: One of the best EDF-based statistics for normality testing. Sample size of less than 26 is recommended, but industrial data with 200 and more might pass A-D. The p-value of the A-D test depends on simulation algorithms. The A-D test can be used to test for other distributions with other specified simulation plans. See D’Agostino and Stephens (1986) for details.
  4. D'Agostino K-Squared: Based on skewness and kurtosis measures. See D’Agostino, Belanger, and D’Agostino, Jr. (1990) and Royston (1991) for details. It is worthwhile mentioning that skewness and kurtosis are also affected by sample size.
  5. Shapiro-Wilk: The recommended sample size for this test ranges from 7 to 2000. Origin allows sample sizes from 3 to 5000. However, when sample size is relatively large, D'Agostino K-squared or Lilliefors are generally preferred over Shapiro-Wilk.
  6. Chen-Shapiro: The C-S test extends the S-W test without loss of power. The motivation for C-S is based on the fact that the ratios of the sample spacing to their expected spacing would converge to one due to the consistency of sample quantiles. From the standpoint of power, C-S performs more like the S-W test rather than the S-F test.

Origin provides users with commonly used methods like S-W, K-S, Lilliefors, A-D, D'Agostino-K Squared and C-S tests. There are six additional normality tests in Origin. You can use the following table to guide your choice of tests. Note that in Origin, we report critical values rather than p-values for Chen-Shapiro test. Critical values can also be used for testing. If the specified statistical value is less than or equal to the 5% critical value, then the p-value should be greater than or equal to 0.05. Hence, we would not reject the null hypothesis for alpha = 0.05.

Summary of Normality tests in Origin
Test Method Statistic N Range Distribution Based
Kolmogorov-Smirnov D 3 <= N EDF
Lilliefors L 4 <= N EDF
Anderson-Darling A-square 8 <= N EDF
D'Agostino K-Squared Chi-square 4 <= N \chi^2(2)
Shapiro-Wilk W 3 <= N <= 5000 -
Chen-Shapiro QH 10 <= N <= 2000 -

Note: Just because you meet sample size requirements (N in the above table), this does not guarantee that the test result is efficient and powerful. Almost all normality test methods perform poorly for small sample sizes (less than or equal to 30).