17.1.10.3 Algorithms (Distribution Fit)

1 Distributions and Maximum Likelihood Estimation(MLE)
2 Goodness of Fit
3 Mean Test
- 3.1 Z-Test
- 3.2 T-Test

Use the Distribution Fit to fit a distribution to a variable.

There are seven distributions can be used to fit a given variable. We calculate the Maximum Likelihood Estimation(MLE) as parameters estimators. For some continuous distributions, we not only give Confidence Limit but also offer Goodness of Fit test.

Distributions and Maximum Likelihood Estimation(MLE)

Normal Distribution

PDF

$\frac{1}{\sqrt{2\pi \sigma^2}}\exp [-\frac{(x-\mu)^2}{2\sigma^2}]$

where $-\infty <x, \mu<\infty$ and $0 < \sigma$ . With $E(X)=\mu$ and $Var(X)=\sigma^2$ .

Maximum Likelihood Estimation(MLE)

Parameters

$\hat{\mu} = \bar{X}_n$
$\hat{\sigma} = \sqrt{\frac{1}{n}\sum_{i=1}^n (X_i - \bar{X}_n)^2}$ .

Confidence Intervals

The confidence interval for $\mu$ and $\sigma$ are:

$\left[ \hat{\mu} - z \hat{\mu}_{se}, \hat{\mu} + z\hat{\mu}_{se} \right]$

$\left[ \frac{\hat{\sigma}}{\exp \left[ (z \hat{\sigma}_{se})/\hat{\sigma} \right]},\hat{\sigma}\exp \left[ (z \hat{\sigma}_{se})/\hat{\sigma} \right] \right]$

where $z$ is the $0.975$ critical value for the standard normal distribution in which $95\%$ is the confidence level. And $\hat{\mu}_{se}$ is standard error for $\mu$ while $\hat{\sigma}_{se}$ is for $\sigma$ .

LogNormal Distribution

PDF

$\frac{1}{x\sqrt{2\pi \sigma^2}} exp\left[ -\frac{(\ln(x)-\mu)^2}{2\sigma^2}\right]$ ,

where $0 \leq x, -\infty < \mu < \infty$ and $0 < \sigma$ . With $E(X)=exp(\mu + \sigma^2/2)$ and $Var(X)=exp(2(\mu + \sigma^2)) -exp(2\mu + \sigma^2 )$ .

Maximum Likelihood Estimation(MLE)

Parameters

$\hat{\mu} = ln\left(\bar{X}_n \right)$
$\hat{\sigma} =ln\left(\sqrt{\frac{1}{n}\sum_{i=1}^n (X_i - \bar{X}_n)^2} \right)$ .

Confidence Interval

The confidence interval for $\mu$ and $\sigma$ are:

$\left[ \hat{\mu} - z \hat{\mu}_{se}, \hat{\mu} + z \hat{\mu}_{se} \right]$

$\left[ \frac{\hat{\sigma}}{\exp \left[ (z \hat{\sigma}_{se})/\hat{\sigma} \right]},\hat{\sigma}\exp \left[ (z \hat{\sigma}_{se})/\hat{\sigma} \right] \right]$

Weibull Distribution

PDF

$\frac{\beta}{\alpha^\beta}x^{\beta -1} exp\left[ -\left(\frac{x}{\alpha}\right)^\beta\right],$

where $\alpha , \beta > 0$ . With $E(X)=\alpha \Gamma \left(1+ \frac{1}{\beta}\right)$ and $Var(X)=\alpha ^2 \{ \Gamma \left(1+\frac{2}{\beta}\right) -\Gamma ^2 \left(1+\frac{1}{\beta} \right) \}$ .

Maximum Likelihood Estimation(MLE)

Origin calls a NAG function nag_estim_weibull (g07bec), for the MLE of statistics of weibull distribution. Please refer to related NAG document, for more details on the algorithm.

Exponential Distribution

PDF

$\frac{1}{\sigma} exp\left[ -\frac{x}{\sigma}\right]$ ,

where $0 \leq x, -\infty < \mu < \infty$ and $0 < \sigma$ . With $E(X)=\sigma$ and $Var(X)=\sigma^2$ .

Maximum Likelihood Estimation(MLE)

Parameters

$\hat{\sigma} = \bar{X}_n$

Confidence Interval

The confidence interval for $\sigma$ is:

$\left[ \frac{\hat{\sigma}}{\exp \left[ (z \hat{\sigma}_{se})/\hat{\sigma} \right]},\hat{\sigma}\exp \left[ (z \hat{\sigma}_{se})/\hat{\sigma} \right] \right]$

where $z$ is the $0.975$ critical value for the standard normal distribution in which $95\%$ is the confidence level. And $\hat{\sigma}_{se}$ is standard error for $\sigma$ .

Gamma Distribution

PDF

$\frac{1}{\Gamma(\alpha)\sigma^\alpha}x^{\alpha -1} exp(-x/\sigma),$

where $\alpha , \sigma > 0$ . With $E(X)=\alpha \sigma$ and $Var(X)=\alpha \sigma ^2$ .

Maximum Likelihood Estimation(MLE)

Parameters

It's not easy to calculate MLE of $\alpha$ and $\sigma$ by hand. But with Newton-Raphson method, we can easily get what we want. In order to obtain good root of likelihood equation, we need to offer a proper initial estimator, which can be given by: $\alpha_0 = \frac{3-s+\sqrt{(s-3)^2+24s}}{12s}$,where $s = \ln \left(\frac{1}{n}\sum_{i=1}^{n}x_i \right) - \frac{1}{n}\sum_{i=1}^{n}\ln (x_i).$

Confidence Interval

The confidence interval for $\alpha$ and $\theta$ are:

$\left[ \hat{\alpha} - z \hat{\alpha}_{se}, \hat{\alpha} + z\hat{\alpha}_{se} \right]$

$\left[ \frac{\hat{\theta}}{\exp \left[ (z \hat{\theta}_{se})/\hat{\theta} \right]},\hat{\theta}\exp \left[ (z \hat{\theta}_{se})/\hat{\theta} \right] \right]$

where $z$ is the $0.975$ critical value for the standard normal distribution in which $95\%$ is the confidence level. And $\hat{\alpha}_{se}$ is standard error for $\alpha$ while $\hat{\theta}_{se}$ is for $\theta$ .

Binomial Distribution

PDF

$\left( \begin{matrix} n \\ x \end{matrix}\right) p^x (1-p)^{n-x},$

where $0 \leq p \leq 1$ and $x=0,1,2,...,n$ . With $E(X)=np$ and $Var(X)=np(1-p)$ . Given a number of success $x$ and sample size $n$

Maximum Likelihood Estimation(MLE)

Parameters

$\hat{p} = x/n$

Confidence Interval

$\left[\frac{1}{1+z^2/n}\left(\hat{p}+\frac{z^2}{2n} - z \sqrt{\frac{1}{n}\hat{p}(1-\hat{p})+\frac{z^2}{4n^2}}\right),\frac{1}{1+z^2/n}\left(\hat{p}+\frac{z^2}{2n} + z \sqrt{\frac{1}{n}\hat{p}(1-\hat{p})+\frac{z^2}{4n^2}}\right)\right]$

where $z$ is the $0.975$ critical value for the standard normal distribution in which $95\%$ is the confidence level.

Possion Distribution

PDF

$e^{-\lambda}\frac{{\lambda}^x}{x!},$

where $x=1,2,...,n$ . With $E(X)=Var(X)=\lambda$ .

Maximum Likelihood Estimation(MLE)

Parameters

$\hat{\lambda} = \frac{1}{n}\sum_{k=1}^{n}x_k$ .

Confidence Interval

The confidence interval for $\lambda$ are:

$\left[ \hat{\lambda} - z \sqrt{\hat{\lambda}}, \hat{\lambda} + z \sqrt{\hat{\lambda}} \right]$

where $z$ is the $0.975$ critical value for the standard normal distribution in which $95\%$ is the confidence level.

Goodness of Fit

Kolmogorov-Smirnov

Origin calls a NAG function nag_1_sample_ks_test (g08cbc) , to compute the statistics. Please refer to related NAG document, for more details on the algorithm.

Kolmogorov-Smirnov(Modified)

Modified Kolmogorov-Smirnov Statistic

The modified Kolmogorov-Smirnov statisticis a modification of the Kolmogorov-Smirnov Statistic based on different distribution.

P-value

The p-value for the Kolmogorov-Smirnov statistic is computed based on critical values table below, provided by D’Agostino and Stephens (1986). If the value of D is between two probability levels, then linear interpolation is used to estimate the p-value.

Here $D_n$ is the Kolmogorov-Smirnov statistic

Normal/Lognormal Distribution

Modified Kolmogorov-Smirnov Statistic:

$D=D_n\left(\sqrt{N}-0.01+\frac{0.85}{\sqrt{N}}\right)$

Critical Values Table

D	<0.775	0.775	0.819	0.895	0.995	1.035	>1.035
P-Value	>=0.15	0.15	0.10	0.05	0.025	0.01	<=0.01

Weibull distribution

Modified Kolmogorov-Smirnov Statistic:

$D=D_n\sqrt{N}$

Critical Values Table

D	<1.372	1.372	1.477	1.577	1.671	>1.671
P-Value	>=0.1	0.1	0.05	0.025	0.01	<=0.01

Exponential Distribution

Modified Kolmogorov-Smirnov Statistic:

$D=\left(D_n-\frac{0.2}{N}\right)\left(\sqrt{N}+0.26+\frac{0.5}{\sqrt{N}}\right)$

Critical Values Table

D	<0.926	0.926	0.995	1.094	1.184	1.298	>1.298
P-Value	>=0.15	0.15	0.10	0.05	0.025	0.01	<=0.01

Gamma Distribution

Modified Kolmogorov-Smirnov Statistic:

$D=D_n\left(\sqrt{N}+\frac{0.3}{\sqrt{N}}\right)$

Critical Values Table

D	<0.74	0.74	0.780	0.800	0.858	0.928	0.990	1.069	1.13	>1.13
P-Value	>=0.25	0.25	0.20	0.15	0.10	0.05	0.025	0.01	0.005	<=0.005

Anderson-Darling

Anderson-Darling Statistics

$z=-N-\sum_{i=1}^n\frac{(2i-1)}{N}\left[lnF(Y_i)+ln(1-F(Y_{N+1-i})\right]$

where

$F$ is the cumulative distribution function of the specified distribution
$Y_i$ are ordered data points: $Y_{1} \leq Y_2 \leq ... \leq Y_{n-1} \leq Y_n$

P-value
The p-value for the Adjusted Anderson-Darling statistics is computed based on critical values table below, provided by D’Agostino and Stephens (1986). If the value of $z^{*}$ is between two probability levels, then linear interpolation is used to estimate the p-value.

Normal/Lognormal Distribution

Adjusted Anderson-Darling Statistics

$z^*=z\left(1 + \frac{0.75}{N}+\frac{2.25}{N^2}\right)$

P-value

$p=\begin{cases} 1-e^{-13.436+101.14z^{*}-223.73z^{*2}}, z^{*} \leq 0.2\\ 1-e^{-8.318+42.796z^{*}-59.938z^{*2}}, 0.2 < z^{*} \leq 0.34\\ e^{0.9177-4.279z^{*}-1.38z^{*2}}, 0.34 < z^{*} \leq 0.6\\ e^{1.2937-5.709z^{*}+0.0186z^{*2}}, z^{*} \geq 153.467 \end{cases}$

Weibull distribution

Adjusted Anderson-Darling Statistics
$z^{*}=\left(1+\frac{0.2}{N}\right)$

Critical Values Table

$z^{*}$	<0.474	0.474	0.637	0.757	0.877	1.038	>1.038
P-Value	>=0.25	0.25	0.10	0.05	0.025	0.01	<=0.01

Exponential Distribution

Adjusted Anderson-Darling Statistics
$z^{*}=z\left(1+\frac{0.6}{N}\right)$

P-value

$p=\begin{cases} 1-e^{-12.2204+67.459z^{*}-110.3z^{*2}}, z^{*} \leq 0.26\\ 1-e^{-6.1327+20.218z^{*}-18.663z^{*2}}, 0.26 < z^{*} \leq 0.51\\ e^{0.9209-3.353z^{*}-0.3z^{*2}}, 0.51 < z^{*} \leq 0.95\\ e^{0.731-3.009z^{*}+0.15z^{*2}}, 0.95 < z^{*} \leq 10.03\\ 0, z^{*} \geq 10.03 \end{cases}$

Gamma Distribution

Critical Values Table

**$0 < \alpha \leq 1$**
$z$	<0.486	0.486	0.657	0.786	0.917	1.092	1.227	>1.227
P-Value	>=0.25	0.25	0.10	0.05	0.025	0.01	0.005	<=0.005

**$1 < \alpha \leq 8$**
$z$	<0.473	0.473	0.637	0.759	0.883	1.048	1.173	>1.173
P-Value	>=0.25	0.25	0.10	0.05	0.025	0.01	0.005	<=0.005

**$\alpha \geq 8$**
$z$	<0.470	0.470	0.631	0.752	0.873	1.035	1.159	>1.159
P-Value	>=0.25	0.25	0.10	0.05	0.025	0.01	0.005	<=0.005

Mean Test

Z-Test

Test Statistics

$t=\frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}$

where

$\bar{x}: \frac{1}{n}\sum_{i=1}^n x_i$
$\mu_0$ : The specified test mean
$\sigma$ : The specified standard deviation

P-Value

The $P$ , is returned based on an approximate Normal test statistics $Z$ .

Confidence Intervals

For the specified significance level, the confidence interval for the sample mean is:

Null Hypothesis	Confidence Interval
$H_0:z=z_0\,\!$	$\left[\bar{x}-Z_{\frac{\sigma}{2}}(\frac{\sigma}{\sqrt{n}}),\bar{x}+Z_{\frac{\sigma}{2}}(\frac{\sigma}{\sqrt{n}})\right]$
$H_0:z \le z_0$	$\left[\bar{x}-Z_{\frac{\sigma}{2}}(\frac{\sigma}{\sqrt{n}}), \infty\right]$
$H_0:z \ge z_0$	$\left[-\infty, \bar{x}+Z_{\frac{\sigma}{2}}(\frac{\sigma}{\sqrt{n}})\right]$