17.3.2.2 Algorithms (Two-Sample T-Test)

The two sample t-test calculates a Student's t statistic and the associated probability to test whether or not the difference of the two sample means equals to \mu_d\,\! (i.e. to test whether or not their means are equal, you can just test whether or not their difference is 0, \mu_1-\mu_2=\mu_d=0\,\! ). And the hypotheses take the form:

H_0:\mu_1-\mu_2=\mu_d\,\! vs H_1:\mu_1-\mu_2 \ne \mu_d Two Tailed

H_0:\mu_1-\mu_2 \le \mu_d vs H_1:\mu_1-\mu_2 > \mu_d Upper Tailed

H_0:\mu_1-\mu_2 \ge \mu_d vs H_1:\mu_1-\mu_2 < \mu_d Lower Tailed

Test Statistics

Consider two independent samplesx_1\,\!andx_2\,\!, of size n_1\,\! and n_2\,\! drawn from two normal population with means \mu_1\,\! and \mu_2\,\!, and variances \sigma_1^2\,\! and \sigma_2^2\,\! respectively, we have:

\bar{x}_1=\frac{1}{n_1}\sum_{j=1}^{n_1}x_{1j}, \bar{x}_2=\frac{1}{n_2}\sum_{j=1}^{n_2}x_{2j}, s_1^2=\frac{1}{n_1-1}\sum_{j=1}^{n_1}{(x_{1j}-\bar{x}_1)^2}, s_2^2=\frac{1}{n_2-1}\sum_{j=1}^{n_2}{(x_{2j}-\bar{x}_2)^2}

where \bar{x}_1\,\!and\bar{x}_2\,\! are sample means and s_1^2\,\! and s_2^2\,\! are sample variances. Then we can compute the t test statistic by:

For equal variance is assumed, that is \sigma_1^2=\sigma_2^2\,\!:

In this case the test statistic t:

t=\frac{(\bar{x}_1-\bar{x}_2)-\mu_d}{s_p\sqrt{(1/n_1+1/n_2)}}

has a t-distribution with (v = n_1+n_2-2) degrees of freedom and

s_p=\sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}}

is the pooled variance of the two samples.

For equal variance is not assumed:

In this case the usual two sample t-statistic no longer has a t-distribution and an approximate test statistic, t'is used:

t'=\frac{(\bar{x}_1-\bar{x}_2)-\mu_d}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}

And a t-distribution with v degrees of freedom is used to approximate the distribution of t'where

v=\frac{(s_1^2/n_1+s_2^2/n_2)^2}{\frac{(s_1^2/n_1)^2}{n_1-1}+\frac{(s_2^2/n_2)^2}{n_2-1}}

Then compare the t value with the critical value and we will reject H_0\,\! if:

Two tailed test: |t| > t_{\sigma/2}\,\!;

Upper tailed test: t > t_\sigma\,\!;

Lower tailed test: t < -t_\sigma\,\!;

The p-value will also be compared with a user-defined significance level,\sigma\,\!, which commonly 0.05 is used. And the null hypothesis H_0\,\! will be rejected if p < \mu\,\!.

Confidence Intervals

The upper and lower (1-\sigma )\times 100\% confident limits for mean difference (\mu_1 - \mu_2)\,\! are calculated as:

For equal variance is assumed:

Null Hypothesis Confidence Interval
H_0:\mu_1-\mu_2=\mu_d\,\! \left[(\bar{x}_1-\bar{x}_2)- t_{\alpha/2}s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}, (\bar{x}_1-\bar{x}_2)+ t_{\alpha/2}s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\right]
H_0:\mu_1-\mu_2 \le \mu_d \left[(\bar{x}_1-\bar{x}_2)- t_{\alpha}s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}},  \infty\right]
H_0:\mu_1-\mu_2 \ge \mu_d \left[-\infty, (\bar{x}_1-\bar{x}_2)+ t_{\alpha}s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\right]

For equal variance is not assumed:

Null Hypothesis Confidence Interval
H_0:\mu_1-\mu_2=\mu_d\,\! \left[(\bar{x}_1-\bar{x}_2)- t_{\alpha/2}\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}, (\bar{x}_1-\bar{x}_2)+ t_{\alpha/2}\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\right]
H_0:\mu_1-\mu_2 \le \mu_d \left[(\bar{x}_1-\bar{x}_2)- t_{\alpha}\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}, \infty\right]
H_0:\mu_1-\mu_2 \ge \mu_d \left[-\infty, (\bar{x}_1-\bar{x}_2)+ t_{\alpha}\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\right]


where t_{\sigma/2}\,\! is the critical value of the t-distribution with v degrees of freedom.

Power Analysis

The power of a two sample t-test is a measurement of its sensitivity. Detail algorithm about calculating power please read the help of Power and Sample Size.

Reference

The two-sample t-test is implemented with a Nag function, nag_2_sample_t_test (g07cac). Please refer to the corresponding Nag document for more details on the algorithm.