17.3.8.2 Algorithms (Two sample proportion test)


Let n_{1}\! be the size of sample 1 and x_{1}\!be the number of event or success ,then the sample proportion \tilde{p_{1}}\! can be expressed:\tilde{p_{1}}=\frac{x_{1}}{n_{1}}.

Similarly,for another sample , sample size is n_{2}\! and x_{2}\! is the number of event,then sample proportion \tilde{p_{2}}=\frac{x_{2}}{n_{2}}

Hypotheses

Let p_{1}\! and p_{1}\! be the true population proportion for sample 1 and 2. and the d_{0}\! is the hypothesized difference between the population proportions.

H_0:p_{1}-p_{2}=d_{0}\! for two tailed test

H_0:p_{1}-p_{2}\ge d_{0}\! for One-tailed test

H_0:p_{1}-p_{2}\le d_{0}\! for One-tailed test

Normal Approximation

P Value

we can perform normal approximation test with assumptions : x_{1}\ge10\! and n_{1}-x_{1}\ge10\!, x_{2}\ge10\! and n_{2}-x_{2}\ge10\! .

To perform the test, calculates the z\! and  p_{value}\! value :

z=\frac{\tilde{p_{1}}-\tilde{p_{2}} -d_{0}}{\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}}+\frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}}
\! .

A special case is that when d_{0} is zero, Origin can use a pooled estimate of p for the test if you check the "pooled" box to do this:

z=\frac{\tilde{p_{1}}-\tilde{p_{2}}}{\sqrt{\tilde{p_{0}}(1-\tilde{p_{0}})({\frac{1}{n_{1}}+ \frac{1}{n_{2}}}})}\! , wherep_{0}=\frac{x_{1}+x_{2}}{n_{1}+n_{2}}

The p-values for each hypotheses are given by:

H_0:p_{1}-p_{2}=d_{0}\! ,p_{value}=2P(Z_{1}\ge|z|)\!,for two tailed test

H_0:p_{1}-p_{2}\ge d_{0}\!,p_{value}=P(Z_{1}\le z)\!,for upper tailed test

H_0:p_{1}-p_{2}\le d_{0}\! ,p_{value}=P(Z_{1}\ge z)\!for lower tailed test

Confidence Interval

For a given confidence level1-\alpha,the confidence interval for the sample proportion can be generated by:

Null Hypothesis Confidence Interval
H_0:p_{1}-p_{2}=d_{0}\! \left[(\tilde{p_{1}}-\tilde{p_{2}})- Z_{\frac{\alpha}{2}}\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}+ \frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}}, (\tilde{p_{1}}-\tilde{p_{2}})+ Z_{\frac{\alpha}{2}}\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}+ \frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}}\right]
H_0:p_{1}-p_{2}\ge d_{0}\! \left[(\tilde{p_{1}}-\tilde{p_{2}})- Z_{\frac{\alpha}{2}}\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}+ \frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}}, 1\right]
H_0:p_{1}-p_{2}\le d_{0}\! \left[-1, (\tilde{p_{1}}-\tilde{p_{2}})+ Z_{\frac{\alpha}{2}}\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}+ \frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}}\right]

Fisher's Exact Test

Exact P_value

Fisher's exact test can be used for all sample sizes when d_{0} \! is zero. Let p(x) denote the probility of hypergeometric distribution when X=x.

P(X=x)=\frac{\begin{pmatrix}x_{1}+x_{2} \\{x}\end{pmatrix}\begin{pmatrix}{n_{1}+n_{2}-x_{1}-x_{2}}\\{n_{1}-x}\end{pmatrix}}{\begin{pmatrix}{n_{1}+n_{2}}\\{n_{1}}\end{pmatrix}}

Let M denote hypergeometric distribution mode: M=\left \lfloor \frac{(n_1+1)(x_1+x_2+1)}{n_1+n_2+2}\right \rfloor

The p-values for each hypothesis are given below:

H_0:p_{1}\ge p_{2}\!, p_{value}=P(x\le x_{1})\!

H_0:p_{1}\le p_{2}\!, p_{value}=P(x\ge x_{1})\!

When H_0:p_{1}= p_{2}\!:

a:x_{1} < M\!: p_{value} = P(X\le x_{1}) + P(X\ge y)

where y is the smallest integer \ge M such that p(y) \le p(x_1)\!.

b:x_{1} = M\!

p_{value} = 1.0\!

c: x_1 > M\!

p_{value} = P(X\ge x_{1}) + P(X\le y)

where y is the largest integer \le M such that p(y) \le p(x_1)\!.