17.3.8.2 Algorithms (Two sample proportion test)

Let $n_{1}\!$ be the size of sample 1 and $x_{1}\!$ be the number of event or success ,then the sample proportion $\tilde{p_{1}}\!$ can be expressed: $\tilde{p_{1}}=\frac{x_{1}}{n_{1}}$ .

Similarly,for another sample , sample size is $n_{2}\!$ and $x_{2}\!$ is the number of event,then sample proportion $\tilde{p_{2}}=\frac{x_{2}}{n_{2}}$

1 Hypotheses
2 Normal Approximation
- 2.1 P Value
- 2.2 Confidence Interval
3 Fisher's Exact Test
- 3.1 Exact P_value

Hypotheses

Let $p_{1}\!$ and $p_{1}\!$ be the true population proportion for sample 1 and 2. and the $d_{0}\!$ is the hypothesized difference between the population proportions.

$H_0:p_{1}-p_{2}=d_{0}\!$ for two tailed test

$H_0:p_{1}-p_{2}\ge d_{0}\!$ for One-tailed test

$H_0:p_{1}-p_{2}\le d_{0}\!$ for One-tailed test

Normal Approximation

P Value

we can perform normal approximation test with assumptions : $x_{1}\ge10\!$ and $n_{1}-x_{1}\ge10\!$ , $x_{2}\ge10\!$ and $n_{2}-x_{2}\ge10\!$ .

To perform the test, calculates the $z\!$ and $p_{value}\!$ value :

$z=\frac{\tilde{p_{1}}-\tilde{p_{2}} -d_{0}}{\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}}+\frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}} \!$ .

A special case is that when $d_{0}$ is zero, Origin can use a pooled estimate of p for the test if you check the "pooled" box to do this:

$z=\frac{\tilde{p_{1}}-\tilde{p_{2}}}{\sqrt{\tilde{p_{0}}(1-\tilde{p_{0}})({\frac{1}{n_{1}}+ \frac{1}{n_{2}}}})}\!$ , where $p_{0}=\frac{x_{1}+x_{2}}{n_{1}+n_{2}}$

The p-values for each hypotheses are given by:

$H_0:p_{1}-p_{2}=d_{0}\!$ , $p_{value}=2P(Z_{1}\ge|z|)\!$ ,for two tailed test

$H_0:p_{1}-p_{2}\ge d_{0}\!$ , $p_{value}=P(Z_{1}\le z)\!$ ,for upper tailed test

$H_0:p_{1}-p_{2}\le d_{0}\!$ , $p_{value}=P(Z_{1}\ge z)\!$ for lower tailed test

Confidence Interval

For a given confidence level $1-\alpha$ ,the confidence interval for the sample proportion can be generated by:

Null Hypothesis	Confidence Interval
$H_0:p_{1}-p_{2}=d_{0}\!$	$\left[(\tilde{p_{1}}-\tilde{p_{2}})- Z_{\frac{\alpha}{2}}\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}+ \frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}}, (\tilde{p_{1}}-\tilde{p_{2}})+ Z_{\frac{\alpha}{2}}\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}+ \frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}}\right]$
$H_0:p_{1}-p_{2}\ge d_{0}\!$	$\left[(\tilde{p_{1}}-\tilde{p_{2}})- Z_{\frac{\alpha}{2}}\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}+ \frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}}, 1\right]$
$H_0:p_{1}-p_{2}\le d_{0}\!$	$\left[-1, (\tilde{p_{1}}-\tilde{p_{2}})+ Z_{\frac{\alpha}{2}}\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}+ \frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}}\right]$

Fisher's Exact Test

Exact P_value

Fisher's exact test can be used for all sample sizes when $d_{0} \!$ is zero. Let p(x) denote the probility of hypergeometric distribution when X=x.

$P(X=x)=\frac{\begin{pmatrix}x_{1}+x_{2} \\{x}\end{pmatrix}\begin{pmatrix}{n_{1}+n_{2}-x_{1}-x_{2}}\\{n_{1}-x}\end{pmatrix}}{\begin{pmatrix}{n_{1}+n_{2}}\\{n_{1}}\end{pmatrix}}$