17.1.4.3 Algorithms (CrossTabs)CrossTabs-Algorithm
CrossTabs is also called Contingency Tables. This tool is used to examine the existence or the strength of any association between variables.
CrossTabs Method
- Frequency Counts
- Marginal and Cell
- Chi-Square Tests Table
- Fisher's Exact Test Table (2 x 2 only)
- Measures of Association
- Measures of Agreement
- Odds Ratio and Relative Risk (2 x 2 only)
- Cochran-Mantel-Haenszel
Frequency Counts
Define
-
are distinct values of row variable in ascending order, i.e. /math-afa34668ee4178285b425ccef9790f80.png?v=0)
-
are distinct values of column variable in ascending order, i.e. /math-443ff437c68afb202ac14bf0d0046751.png?v=0)
-
is the frequency with respect to cell /math-5270ae675fac24f97e172dcd9b18fa92.png?v=0)
-
is subtotal of the th row
-
is subtotal of the th column
-
is the total number.
Marginal and Cell
Statistics
|
Formula and Explanation
|
Count
|
|
Expected Count
|
|
Row Percent
|
|
Column Percent
|
|
Total Percent
|
|
Residual
|
|
Std. Residual
|
|
Adj. Residual
|
|
Chi-Square Statistics
Statistics
|
Formula and Explanation
|
Degree of Freedom
|
Pearson Chi-Square
|
|
|
Likelihood Ratio
|
|
|
Linear Association
|
, where is the Pearson correlation coefficient.
|
|
Continuity Correction
|
, which is calculated only for 2 x 2 table
|
|
Fisher's Exact Test
This test is useful when some expected cell count is low (less than 5). It's calculated only for 2 x 2 table. Suppose we have the table in the following:
|
|
|
Subtotal/Total
|
|
|
|
|
|
|
|
|
Subtotal/Total
|
|
|
|
Under the null hypothesis (Independence), the count of the first cell is a hypergeometric distribution with probability given by
, .
one-Sided test
The one-sided test significance level is calculated by
- p(left-sided test) =
/math-86e6e38543be7730a5d275879331853b.png?v=0)
- p(right-sided test) =
/math-a0da9ea8cfd87134a3bc676fc188f06e.png?v=0)
Two-Sided tail
The two-tail significance is
where
, if /math-ad6e799464206e320a500025fbb12f7f.png?v=0)
, if /math-dff4b325fc0edc3fc4cf9134afe80cd6.png?v=0)
/math-fb4549dbe0388469b28f1e469e8dbccc.png?v=0)
Measures of Association
Define
-
/math-e1447424a87a72997354115e74f780d2.png?v=0)
-
-
-
-
/math-ae03d6a4fc3d6bde417a7352ed6c0d76.png?v=0)
-
-
is subtotal of the th row
-
is subtotal of the th column
-
is the total number.
Statistics
|
Formula and Explanation
|
Standard Error
|
Phi Coefficient
|
, which is calculated for not 2 x 2 table. For a 2 x 2 table, it is equal to
The value ranges from , where ,
|
|
Cramer's V
|
|
|
Contingency Coefficient
|
|
|
Gamma
|
|
|
Kendall
|
Tau-b
|
|
|
Tau-c
|
, where
|
|
Somer's D
|
C R
|
|
|
R C
|
|
|
Symmetric
|
|
|
Lambda
|
C R
|
, where is the largest count in ith row, and is the largest column subtotal.
|
,
where is the column index of , is the index of column subtotal for .
|
R C
|
,
where is the largest count in jth column, and is the largest row subtotal.
|
,
where is the row index of , is the index of row subtotal for .
|
Symmetric
|
|
/math-19a542d5e209aac0810776eadb8cedef.png?v=0)
where , , , and .
|
Uncertainty
|
C R
|
, where , and , and
|
, where
|
R C
|
|
|
Symmetric
|
|
|
Measures of Agreement
This table is calculated only when two conditions are satisfied (1) square table, i.e. , and (2) the row variable and column variable have same values.
The Kappa statistic is calculated by
/math-5254a7c3d0d060337a3cbbbdf6875b25.png?v=0)
The standard error is estimated by:
.
where , ,
and .
The corresponding asymptotic standard error under the null hypothesis is given by
![SE_0 = \sqrt{\frac{1}{N\left(N^2 - \sum_{i=1}^{R}r_ic_i\right)^2} \left[N^2\sum_{i=1}^{R}r_ic_i + \left(\sum_{i=1}^{R}r_ic_i\right)^2 - N \sum_{i=1}^{R}r_ic_i(r_i+c_i)\right]} SE_0 = \sqrt{\frac{1}{N\left(N^2 - \sum_{i=1}^{R}r_ic_i\right)^2} \left[N^2\sum_{i=1}^{R}r_ic_i + \left(\sum_{i=1}^{R}r_ic_i\right)^2 - N \sum_{i=1}^{R}r_ic_i(r_i+c_i)\right]}](//d2mvzyuse3lwjc.cloudfront.net/doc/en/UserGuide/images/Algorithm(CrossTabs)/math-2fb11a39b32089a9fb7b374c23a68abc.png?v=0)
Another related statistic is Bowker, which is used to test for all pairs. If , the statistic is calculated as
/math-d430e26fecf56fc1e12fb1c885b0977d.png?v=0)
For lager samples, is asymptotically chi-square distribution with degree of freedom .
Note that for 2 x 2 table, Bowker's test is equal to McNemar's test. So we only give Bowker's test.
Odds Ratio and Relative Risk
These statistics are calculated only for 2 x 2 table.
Odds Ratio
The Odds Ratio is calculated as
Relative Risk
The Relative Risks are given by
-
/math-ddef868fb9aede240998288fda3c3aa7.png?v=0)
-
/math-d2ceb3919433fdc636aaa24dbf97eff3.png?v=0)
-
/math-beb356acfebcc913c83cc436c4c1a294.png?v=0)
-
/math-dccda1547a25ca3e806a9ca71e7f18b8.png?v=0)
Cochran-Mantel-Haenszel
Define
-
be the number of layers
-
be the frequency in the ith row, jth column and kth layer
-
be the jth column, kth layer subtotal
-
be the ith row, kth layer subtotal
-
be the kth layer subtotal
-
be the expected frequency of the ith row jth column kth layer cell
-
/math-5f5f9e9f589b49daa28976461688df13.png?v=0)
Mantel-Haenszel statistic
The Mantel-Haenszel statistic is given by
where sgn is the sign function .
Breslow-Day statistic
The Breslow-Day statistic is
where .
Tarone’s Statistic
The Tarone’s Statistic is
![T = \sum_{k=1}^{K} V_k \left[f_{11k}-\hat{f}_{11k}\right]^2- \frac{\sum_{k=1}^{K}\left[f_{11k}-\hat{f}_{11k}\right]^2}{\sum_{k=1}^{K}\frac {1}{V_k} } T = \sum_{k=1}^{K} V_k \left[f_{11k}-\hat{f}_{11k}\right]^2- \frac{\sum_{k=1}^{K}\left[f_{11k}-\hat{f}_{11k}\right]^2}{\sum_{k=1}^{K}\frac {1}{V_k} }](//d2mvzyuse3lwjc.cloudfront.net/doc/en/UserGuide/images/Algorithm(CrossTabs)/math-2d5715c073380230f9458c06eed9823c.png?v=0)
where .
Common Odds Ratio
For a 2×2×K table, the odds ratio at the kth layer is .
Assuming that the true common odds ratio exists,taht is , Mantel-Haenszel's estimator of the common odds ratio is
/math-488db329f6cab4e8b965215f34861f44.png?v=0)
The asymptotic variance for is:
![\hat Var[ln(\hat OR_{MH})]=\frac{\sum_{k=1}^{K}\frac{(f_{11k}+f_{22k})f_{11k} f_{22k}}{n_{k}^2}}{2\sum_{k=1}^{K}\frac{f_{11k} f_{22k}}{n_{k}}}+\frac{\sum_{k=1}^{K}\frac{(f_{11k}+f_{22k})f_{12k} f_{21k}+(f_{12k}+f_{21k})f_{11k} f_{22k}}{n_{k}^2}}{2\sum_{k=1}^{K}\frac{f_{11k} f_{22k}}{n_{k}}\sum_{k=1}^{K}\frac{f_{12k} f_{21k}}{n_{k}}}+\frac{\sum_{k=1}^{K}\frac{(f_{12k}+f_{21k})f_{12k} f_{21k}}{n_{k}^2}}{2\sum_{k=1}^{K}\frac{f_{12k} f_{21k}}{n_{k}}} \hat Var[ln(\hat OR_{MH})]=\frac{\sum_{k=1}^{K}\frac{(f_{11k}+f_{22k})f_{11k} f_{22k}}{n_{k}^2}}{2\sum_{k=1}^{K}\frac{f_{11k} f_{22k}}{n_{k}}}+\frac{\sum_{k=1}^{K}\frac{(f_{11k}+f_{22k})f_{12k} f_{21k}+(f_{12k}+f_{21k})f_{11k} f_{22k}}{n_{k}^2}}{2\sum_{k=1}^{K}\frac{f_{11k} f_{22k}}{n_{k}}\sum_{k=1}^{K}\frac{f_{12k} f_{21k}}{n_{k}}}+\frac{\sum_{k=1}^{K}\frac{(f_{12k}+f_{21k})f_{12k} f_{21k}}{n_{k}^2}}{2\sum_{k=1}^{K}\frac{f_{12k} f_{21k}}{n_{k}}}](//d2mvzyuse3lwjc.cloudfront.net/doc/en/UserGuide/images/Algorithm(CrossTabs)/math-fe6d99fcba77f28d2e62c52d30d014f4.png?v=0)
The lower confidence limit(LCL) and upper confidence limit(UCL) for is:
and ![ln(\hat OR_{MH})+z(alpha/2)\sqrt{\hat Var[ln(\hat OR_{MH})]} ln(\hat OR_{MH})+z(alpha/2)\sqrt{\hat Var[ln(\hat OR_{MH})]}](//d2mvzyuse3lwjc.cloudfront.net/doc/en/UserGuide/images/Algorithm(CrossTabs)/math-e8dcfecb9336a5c778bcff6588b5c278.png?v=0)
|