17.10.2 Algorithms (ROC Curve)

In this part, Following notation will be used.

x_i\,\! : Test result score for case

n_{TP}\,\! : Number of true positive decisions

n_{FN}\,\! : Number of false negative decisions

n_{TN}\,\! : Number of true negative decisions

n_{FP}\,\! : Number of false positive decisions

n_{-}\,\!: Number of cases with negative actual state

n_{+}\,\!: Number of cases with positive actual state

n_{-=j}\,\!: Number of true negative cases with test results equal to

n_{+>j}\,\!: : Number of true positive cases with test results greater than

n_{+=j}\,\!: : Number of true positive cases with test results equal to

n_{-<j}\,\!: : Number of true negative cases with test results less than


ROC Values

1- Specificity (X): 1-\frac{n_{TN}}{n_{TN}+n_{FP}}\,\!

Sensitivity (Y):\frac{n_{TP}}{n_{TP}+n_{FN}}\,\!

The area under the ROC curve

Let x\,\! be the scale of the test result variable. Denote x_{-}\,\! by the x\,\! values for cases with negative actual states and x_{+}\,\! the values for cases with positive actual states. Then, the nonparametric approximation of the &rdquor;true” area under the ROC curve, \theta \,\!,is

 A_Z=\frac 1{n_{+}n_{-}}\sum_{j=1}^{n_{-}}\sum _{i=1}^{n_{+}}\Psi (x_{+},x_{-})

where n_{+}\,\! is the sample size of D\,\!+, n_{+}\,\!is the sample size of D\,\!-, and

\Psi (x_{+},x_{-})=\,\! 
\begin{cases} 
  1,  & \mbox{if }x_{+}>x_{-} \\
  0.5, & \mbox{if }x_{+}=x_{-} \\
  0, & \mbox{if }x_{+}<x_{-} 
\end{cases}

Note that A_z\,\! is the observed area under the ROC curve, which connects successive points by a straight line, i.e., by the trapezoidal rule.

An alternative way to compute A_z\,\! is as follows:

A_Z=\frac 1{n_{+}+n_{-}}\sum \left\{ n_{-=j}n_{+>j}+\frac{n_{-=j}n_{+=j}}2\right\}

The SE of the area under the ROC curve statistic

The standard deviation of A_z\,\! is estimated by:

SE(A_Z)=\sqrt{\frac{A_Z(1-A_Z)+(n_{+}-1)(Q_1-A_Z^2)+(n_{-}-1)(Q_2-A_Z^2)}{n_{+}n_{-}}} \,\!

where

Q_{1=\frac 1{n_{-}n_{+}^2}}\sum n\__{=j}[n_{+>j}^2+n_{+>j}n_{+=j}+\frac{n_{+>j}^2}3] \,\!

and

Q_{2=\frac 1{n_{-}^2n_{+}}}\sum n_{+=j}[n_{->j}^2+n_{->j}n_{-=j}+\frac{n_{-=j}^2}3] \,\!

The asymptotic confidence interval of the area under the ROC curve

A 2-sided asymptotic c\%=(100-\alpha )\%\,\! confidence interval for the true area under the ROC curve is

A_Z\pm SE(A_Z)\,\!

The asymptotic P-value under the null hypothesis that  \theta=0.5\ \,\! vs. the alternative hypothesis that  \theta \neq 0.5\ \,\!

Since A_z\,\! is asymptotically normal under the null hypothesis that  \theta=0.5\ \,\! , we can calculate the asymptotic P-value under the null hypothesis that  \theta=0.5\ \,\! vs. the alternative hypothesis that  \theta \neq 0.5\ \,\!:

P\left( \left| Z\right| >\left| \frac{A_Z-0.5}{SD(A_Z)|_{\theta =0.5}}\right| \right) =2P\left( Z>\left| \frac{A_Z-0.5}{SD(A_Z)\mid _{\theta =0.5}}\right| \right)

In the nonparametric case,

SD(A_Z)|_{\theta =0.5}=\sqrt{\frac{\theta (1-\theta )+(n_{+}-1)(Q_1-\theta ^2)+(n_{-}-1)(Q_2-\theta ^2)}{n_{+}n_{-}}}|_{\theta =0.5}\,\!

=\sqrt{\frac{0.5(1-0.5)+(n_{+}-1)(\frac 13-0.5^2)+(n_{-}-1)(\frac 13-0.5^2)}{n_{+}n_{-}}}

Optimal Cut-Point Value

The cut-point value is defined by the equality maximization of these two quantities (SpEqualSe), which is min( abs(1-x-y) ) for ROC curve.