2.3.2 Algorithm for Attribute Agreement Analysis

1 Assessment Agreement
- 1.1 Percent Agreement
- 1.2 Confidence Intervals for Percent Agreement
  - 1.2.1 Lower Bound
  - 1.2.2 Upper Bound
2 Assessment Disagreement
- 2.1 Percent Disagreement
3 Fleiss' Kappa Statistics
- 3.1 Unknown Standard
- 3.2 Known Standard
4 Cohen's Kappa Statistics
- 4.1 Unknown Standard
- 4.2 Known Standard
5 Kendall's Statistics

Assessment Agreement

Percent Agreement

$Percent = 100*m/N$

where

$m:$ the number of matched ratings.

$N:$ the total number of samples.

Within Appraisers

It needs each appraiser rates each sample at least twice. The matched event is that all the trials to the same sample for each appraiser are given the same rating. Otherwise, the rating of that sample for the appraiser is not matched.

Each Appraiser VS Standard

The standard/attribute for each sample has to be known. The matched event is that all the trials to the same sample for each appraiser are the same with the known standard of this sample. Otherwise, it is not matched.

Between Appraisers

The matched event is that, to the same sample, all the trials from all appraisers are the same.

All Appraisers VS Standard

The standard/attribute for each sample has to be known. The matched event is that, to the same sample, all the trials from all appraisers are the same with the known standard of this sample.

Confidence Intervals for Percent Agreement

Given $\alpha$ , the confidence intervals (lower bound and upper bound) for the percent agreement are computed as following (if $\alpha=0.05$ , it is 95% lower bound and upper bound).

Lower Bound

$LB=\frac{\nu_1F_{\nu_1,\nu_2,\alpha/2}}{\nu_2+\nu_1F_{\nu_1,\nu_2,\alpha/2}}$

where

$\nu_1=2m$

$\nu_2=2(N-m+1)$

$m:$ the number of matches.

$N:$ the number of samples.

$F_{\nu_1,\nu_2,\alpha/2}:$ the $(100*\alpha/2)^{th}$ percentile of the F distribution with $\nu_1$ and $\nu_2$ degrees of freedom.

If no agreement, that is $Percent=0$ , or $m=0$ , the lower bound is 0. If perfect agreement, that is $Percent=100$ , or $m=N$ , $\alpha$ is used instead of $\alpha/2$ in the formula.

Upper Bound

$UB=\frac{\nu_1F_{\nu_1,\nu_2,1-\alpha/2}}{\nu_2+\nu_1F_{\nu_1,\nu_2,1-\alpha/2}}$

where

$\nu_1=2(m+1)$

$\nu_2=2(N-m)$

$m:$ the number of matches.

$N:$ the number of samples.

$F_{\nu_1,\nu_2,1-\alpha/2}:$ the $(100*(1-\alpha/2))^{th}$ percentile of the F distribution with $\nu_1$ and $\nu_2$ degrees of freedom.

If no agreement, that is $Percent=0$ , or $m=0$ , $\alpha$ is used instead of $\alpha/2$ in the formula. If perfect agreement, that is $Percent=100$ , or $m=N$ , the upper bound is 1.

Assessment Disagreement

The assessment disagreement is the difference from the known standard ratings. So, the standard/attribute for each sample has to be known.

Percent Disagreement

$Percent=100*c/N$

where

$c:$ the number of assessments different from the known standard rating.

$N:$ the total number of trials.

The percent disagreement indicates the percentage of non-matches in the ratings. The non-matched event is that, for each sample, and each appraiser, if the trial is not rating the same with the known standard of that sample. For each non-matched event, the non-matched count increases by 1.

Fleiss' Kappa Statistics

Unknown Standard

There are two cases for computing the Fleiss' kappa statistics with unknown standard, agreement within each appraiser and agreement between all appraisers.

Agreement within each appraiser is to examine the agreement between the trials within each appraiser. so, it needs the number of trials within each appraiser to be greater than 1.

Agreement between all appraisers is interested in the agreement of all the appraisers. So, the number of appraisers is assumed to be greater than 1, and then the number of trials within each appraiser can be 1 or greater than 1.

Overall Kappa
The overall kappa coefficient is defined by:
$K = \frac{P_o-P_e}{1-P_e}$

where
$P_o=\frac{\sum_{i=1}^n\sum_{j=1}^kx_{ij}^2-nm}{nm(m-1)}:$ the observed proportion of the pairwise agreement among the trials.

$P_e=\sum_{j=1}^kp_j^2:$ the expected proportion of agreement.

$p_j=\frac{1}{nm}\sum_{i=1}^nx_{ij}:$ the overall proportion of ratings in category $j$ .

$k:$ the total number of categories.

$m:$ the number of trials. For agreement within each appraiser, it is the number of trials for each appraiser. For agreement between all appraisers, it is the number of trials for all appraisers.

$n:$ the number of samples.

$x_{ij}:$ the number of ratings on sample $i$ into category $j$ .

Kappa for Single Category
The formula for the kappa coefficient for the $j^{th}$ category is defined by:
$K_j=1-\frac{\sum_{i=1}^nx_{ij}(m-x_{ij})}{nm(m-1)p_j(1-p_j)}$

Each parameter has the same meaning as described above for Overall Kappa.

Testing Significance
The following $Z$ statistic is used to test if $K > 0$ :
$Z = \frac{K}{\sqrt{Var(K)}}$

where
$K:$ the overall kappa coefficient.

$Var(K)=\frac{2}{nm(m-1)\left(\sum_{j=1}^kp_j(1-p_j)\right)^2}\left(\left(\sum_{j=1}^kp_j(1-p_j)\right)^2-\sum_{j=1}^kp_j(1-p_j)(1-2p_j)\right)$

Other parameters have the same meanings as described above for Overall Kappa.

For the $j^{th}$ category, the following $Z_j$ statistic is used for testing if $K_j>0$ :
$Z_j=\frac{K_j}{\sqrt{Var(K_j)}}$

where
$K_j:$ the kappa coefficient for the $j^{th}$ category.

$Var(K_j) = \frac{2}{nm(m-1)}$

Other parameters have the same meanings as described above for Overall Kappa.

Known Standard

Kappa Statistics
If the standard is known, the following steps are used for computing kappa coefficients, including overall and single category.
1. Consider the standard as one trial, then for each trial, combine with the standard to treat as two trials ratings, and then use the formulas in Unknown Standard to estimate kappa coefficients for these two combined trials.
2. Repeat all the trials (assumed there are $m$ trials) to get the $m$ sets of kappa coefficients (including both overall and single category).
3. Calculate the average of the estimated $m$ sets of kappa coefficients, and the results are the overall kappa coefficient and single category kappa coefficients respectively.

Testing Significance
1. Follow the same steps as calculation for Kappa Statistics above, and get $m$ variances of the kappa statistic ( $Var(K)$ and $Var(K_j)$ ).
2. The variance of overall kappa with known standard is then calculated by the sum of the $m$ variances, $Var(K)$ , and divided by $m^2$ .
3. Similarly, calculate the variance of kappa for a specific category with known standard by the sum of the $m$ variances for the kappa for a specific category, ( $Var(K_j)$ ), and divided by $m^2$ .
4. Finally, use the same formulas in Unknown Standard to calculate the $Z$ statistic with the obtained variances of overall and singe category in previous steps.

Cohen's Kappa Statistics

Unknown Standard

There are two cases to calculate Cohen's kappa statistic with unknown standard, and each case should meet its own condition.

For within appraiser, the condition is that, each appraiser should have exactly two trials for each sample.

For between appraisers, the number of appraisers should be exactly two, and each has only one trial.

Assumed there are $k$ categories, for the ratings from two trials (within appraiser) or two appraisers (one trial for each appraiser), the following table can be used for the calculation.

	Trial 2 (or Appraiser 2)
Trial 1 (or Appraiser 1)	1	2	...	$k$	Total
1	$p_{11}$	$p_{12}$	...	$p_{1k}$	$p_{1+}$
2	$p_{21}$	$p_{22}$	...	$p_{2k}$	$p_{2+}$
...	...	...	...	...	...
$k$	$p_{k1}$	$p_{k2}$	...	$p_{kk}$	$p_{k+}$
Total	$p_{+1}$	$p_{+2}$	...	$p_{+k}$	1

where

$p_{ij}=\frac{n_{ij}}{N}$

$n_{ij}:$ the number of samples that the first trial (appraiser) is rating category $i$ , and the second trial (appraiser) is rating category $j$ .

$N:$ the total number of samples.

$p_{+i}=\sum_{j=1}^kp_{ji}$

$p_{i+}=\sum_{j=1}^kp_{ij}$

Overall Kappa
The overall kappa coefficient is defined by:
$K = \frac{P_o-P_e}{1-P_e}$

where
$P_o=\sum_{i=1}^kp_{ii}:$ the observed proportion of agreement.

$P_e=\sum_{i=1}^kp_{i+}p_{+i}:$ the expected proportion of agreement.

Kappa for Single Category
The formula for the kappa coefficient for the $j^{th}$ category is calculated by:
$K_j=\frac{p_{jj}-p_{+j}p_{j+}}{(p_{+j}+p_{j+})/2-p_{+j}p_{j+}}$

Testing Significance
The following $Z$ statistic is used to test if $K > 0$ :
$Z = \frac{K}{SE}$

where
$K:$ the overall kappa coefficient.

$SE=\frac{\sqrt{P_e+P_e^2-\sum_{i=1}^kp_{i+}p_{+i}(p_{i+}+p_{+i})}}{(1-P_e)\sqrt{N}}:$ the standard error of kappa coefficient.

For the $j^{th}$ category, the following $Z_j$ statistic is used for testing if $K_j>0$ :
$Z_j=\frac{K_j}{SE_j}$

where
$K_j:$ the kappa coefficient for the $j^{th}$ category.

$SE_j = \frac{\sqrt{p_{+j}p_{j+}+p_{+j}^2p_{j+}^2-p_{+j}p_{j+}(p_{+j}+p_{j+})}}{((p_{+j}+p_{j+})/2-p_{+j}p_{j+})\sqrt{N}}:$ the standard error of kappa coefficient of the $j^{th}$ category.

Known Standard

To calculate Cohen's kappa statistic with known standard, the similar procedure is used as Unknown Standard.

Assumed there are $k$ categories for standard, for the ratings from each trial, the following table can be used for the similar calculation.

	Standard
Trial	1	2	...	$k$	Total
1	$p_{11}$	$p_{12}$	...	$p_{1k}$	$p_{1+}$
2	$p_{21}$	$p_{22}$	...	$p_{2k}$	$p_{2+}$
...	...	...	...	...	...
$k$	$p_{k1}$	$p_{k2}$	...	$p_{kk}$	$p_{k+}$
Total	$p_{+1}$	$p_{+2}$	...	$p_{+k}$	1

where

$p_{ij}=\frac{n_{ij}}{N}$

$n_{ij}:$ the number of samples that the trial is rating category $i$ , and the standard is category $j$ .

$N:$ the total number of samples.

$p_{+i}=\sum_{j=1}^kp_{ji}$

$p_{i+}=\sum_{j=1}^kp_{ij}$

Kappa
- Each Appraiser VS Standard
  1. For the $i^{th}$ trial, calculate $K^{(i)}$ and $K_j^{(i)}$ using the same formulas as Unknown Standard.
  2. Sum up all $K^{(i)}$ and $K_j^{(i)}$ from all trials respectively, and then divided by the number of trials, $m$ , that is:
    $K=\frac{\sum_{i=1}^mK^{(i)}}{m}$
    
    $K_j=\frac{\sum_{i=1}^mK_j^{(i)}}{m}$
- All Appraisers VS Standard
  1. For the $i^{th}$ trial from the $l^{th}$ appraiser, calculate $K^{(il)}$ and $K_j^{(il)}$ using the same formulas as Unknown Standard.
  2. Sum up all $K^{(il)}$ and $K_j^{(il)}$ from all trials and all appraisers respectively, and then divided by the number of trials, $m$ , and the number of appraisers, $L$ , that is:
    $K=\frac{\sum_{i=1}^m\sum_{l=1}^LK^{(il)}}{mL}$
    
    $K_j=\frac{\sum_{i=1}^m\sum_{l=1}^LK_j^{(il)}}{mL}$

Testing Significance
- Each Appraiser VS Standard
  1. For the $i^{th}$ trial, calculate $SE^{(i)}$ and $SE_j^{(i)}$ using the same formulas as Unknown Standard.
  2. Sum up all $SE^{(i)}*SE^{(i)}$ and $SE_j^{(i)}*SE_j^{(i)}$ from all trials respectively, and then get the sum of variances.
  3. The final calculation of $SE$ and $SE_j$ is:
    $SE=\frac{\sqrt{\sum_{i=1}^m\frac{SE^{(i)}*SE^{(i)}}{m}}}{\sqrt{m}}$
    
    $SE_j=\frac{\sqrt{\sum_{i=1}^m\frac{SE_j^{(i)}*SE_j^{(i)}}{m}}}{\sqrt{m}}$
    
    where $m$ is the number of trials.
  4. Then $Z$ and $Z_j$ are calculated by:
    $Z=\frac{K}{SE}$ and $Z_j=\frac{K_j}{SE_j}$
- All Appraisers VS Standard
  1. For the $i^{th}$ trial from the $l^{th}$ appraiser, calculate $SE^{(il)}$ and $SE_j^{(il)}$ using the same formulas as Unknown Standard.
  2. Sum up all $SE^{(il)}*SE^{(il)}$ and $SE_j^{(il)}*SE_j^{(il)}$ from all trials and all appraisers respectively, and get the sum of variances.
  3. The final calculation of $SE$ and $SE_j$ is:
    $SE=\frac{\sqrt{\sum_{i=1}^m\sum_{l=1}^L\frac{SE^{(il)}*SE^{(il)}}{mL}}}{\sqrt{mL}}$
    
    $SE_j=\frac{\sqrt{\sum_{i=1}^m\sum_{l=1}^L\frac{SE_j^{(il)}*SE_j^{(il)}}{mL}}}{\sqrt{mL}}$
    
    where $m$ is the number of trials, and $L$ is the number of appraisers.
  4. Then $Z$ and $Z_j$ are calculated by:
    $Z=\frac{K}{SE}$ and $Z_j=\frac{K_j}{SE_j}$

Kendall's Statistics

To calculate Kendall's statistics, it assumes the ratings and standard are ordinal data, and there are at least 3 or more levels.

If standard is unknown, the Kendall's coefficient of concordance is computed for within appraiser and between appraisers. For within appraiser, there should be at least 2 trials for each appraiser. And for between appraisers, the number of appraisers should be at least 2.

If standard is known, the Kendall's correlation coefficient is computed for each appraiser vs standard and all appraisers vs standard. For all appraisers vs standard, there should be at least 2 appraisers.

Kendall's Coefficient of Concordance

The Kendall's coefficient of concordance is estimated by:

$W=\frac{12\sum_{i=1}^NR_i^2-3K^2N(N+1)^2}{K^2N(N^2-1)-K\sum_{j=1}^KT_j}$

where

$N:$ the number of samples.

$K:$ the number of trials for within appraiser. For between appraisers, it is $K=m*L$ where $m$ is the number of trials and $L$ is the number of appraisers.

$R_i=\sum_{k=1}^KR_i^{(k)}:$ the sum of ranks for the $i^{th}$ sample, and $R_i^{(k)}$ is the rank of each trial from each appraiser for the $i^{th}$ sample.

$T_j=\sum_{i=1}^{g_j}(t_i^3-t_i):$ the penalty from the $j^{th}$ trial.

$t_i:$ the number of tied ranks in the $i^{th}$ tie (level).

$g_j:$ the number of ties (levels) in the $j^{th}$ trial.

Testing Significance of Kendall's Coefficient of Concordance

The following formula is used to test the significance of Kendall's coefficient of concordance:

$c^2=K(N-1)W$

where

$c^2:$ the chi-square distribution with $N-1$ degrees of freedom.

$N:$ the number of samples.

$K:$ the number of trials for within appraiser. For between appraisers, it is $K=m*L$ where $m$ is the number of trials and $L$ is the number of appraisers.

$W:$ the calculated Kendall's coefficient of concordance.

Kendall's Correlation Coefficient

To calculate Kendall's correlation coefficient between each trial and the standard, the table below is used (assumed there are $k$ levels).

	Standard
Trial	1	2	...	$k$	Total
1	$n_{11}$	$n_{12}$	...	$n_{1k}$	$n_{1+}$
2	$n_{21}$	$n_{22}$	...	$n_{2k}$	$n_{2+}$
...	...	...	...	...	...
$k$	$n_{k1}$	$n_{k2}$	...	$n_{kk}$	$n_{k+}$
Total	$n_{+1}$	$n_{+2}$	...	$n_{+k}$	N

where

$n_{ij}:$ the number of samples that the trial is rating category (level) $i$ , and the standard is category (level) $j$ .

$N:$ the total number of samples.

$n_{+i}=\sum_{j=1}^kn_{ji}$

$n_{i+}=\sum_{j=1}^kn_{ij}$

Then the Kendall's correlation coefficient between each trial and the standard is computed by:

$\tau_c^{(i)}=\frac{C-D}{\sqrt{N(N-1)/2-T_r}\sqrt{N(N-1)/2-T_c}}$

where

$(i):$ for the $i^{th}$ trial from each appraiser.

$T_r=\sum_{i=1}^kn_{i+}(n_{i+}-1)/2:$ the number of pairs tied on row.

$T_c=\sum_{i=1}^kn_{+i}(n_{+i}-1)/2:$ the number of pairs tied on column.

$C=\sum_{a=1}^{k-1}\sum_{b=1}^{k-1}\left(n_{ab}\sum_{c=a+1}^{k}\sum_{d=b+1}^{k}n_{cd}\right):$ the number of concordant pairs.

$D=\sum_{a=2}^{k-1}\sum_{b=1}^{k-1}\left(n_{ab}\sum_{c=1}^{a-1}\sum_{d=b+1}^{k}n_{cd}\right):$ the number of discordant pairs.

And the final Kendall's correlation coefficient is the average of all trials from each appraiser:

$\tau_c=\frac{\sum_{i=1}^K\tau_c^{(i)}}{K}$

where

$K:$ the number of trials for each appraiser vs standard, otherwise for all appraisers vs standard, $K=mL$ , where $m$ is the number of trials and $L$ is the number of appraisers.

Testing Significance of Kendall's Correlation Coefficient

When the standard is known, use the following formula for testing the significance of Kendall's correlation coefficient.

$Z=\frac{3\left(\tau_c-\frac{2}{KN(N-1)}\right)\sqrt{KN(N-1)}}{\sqrt{2(2N+5)}},\tau_c > 0$

$Z=\frac{3\left(\tau_c+\frac{2}{KN(N-1)}\right)\sqrt{KN(N-1)}}{\sqrt{2(2N+5)}},\tau_c\leq 0$

where

$K:$ the number of trials for each appraiser vs standard, otherwise for all appraisers vs standard, $K=mL$ , where $m$ is the number of trials and $L$ is the number of appraisers.

$N:$ the total number of samples.

$\tau_c:$ the calculated Kendall's correlation coefficient.