2.5.1.2 Algorithm: Parametric Distribution Analysis (Right Censoring)

Data are input in time and censoring indicator pairs $(t_i, c_i)$ .

The MLE (maximum likelihood estimation) and LS (least squares) methods are used to estimate parameters for each distribution.

Under the MLE method:

First calculate the likelihood function $\mathcal{L}( \theta )$ for the distribution, which is the product of Probability Density Function for failure time data and the Survival Density Function for censored time data.
Then maximize the logarithm of the likelihood function $\ell ( \theta )$ by setting its partial derivatives equal to zero with respect to parameters.
Use Newton-Raphson method to solve these parameters in step 2.

Under the LS method:

First plot the failure times (or log failure times) against the transformed cumulative probabilities on a probability plot.
Then fit a straight line by minimizing the sum of squared deviations
Use the slope and scale from the fitted line to calculate parameter estimates

1 Distributions
2 Hazard Function
3 Estimate Parameter Standard Errors
4 Quantities of Distribution
5 Percentiles
6 Survival Probabilities
7 Probability Plot
8 Goodness of Fit
- 8.1 Anderson-Darling Test
- 8.2 Pearson Correlation Coefficient
9 Reference

Distributions

Supported distributions include:

Distribution	Parameters	Probability density function (PDF)	Cumulative distribution function (CDF)	Inverse cumulative distribution function
Normal	location: $\mu$ scale: $\sigma$	$f(x)=\frac{1}{ \sqrt{2 \pi} \sigma } e^{ -\frac{( x - \mu )^2}{2 \sigma^2} }, \; \sigma > 0$	$P(x) = \Phi \left( \frac{x - \mu}{\sigma} \right)$	$x = \mu + \sigma \Phi^{-1}(p)$
Lognormal	location: $\mu$ scale: $\sigma$	$f(x)=\frac{1}{ \sqrt{2 \pi} \sigma x } e^{ -\frac{( \text{ln}(x) - \mu )^2}{2 \sigma^2} },$ $x>0 , \; \sigma > 0$	$P(x) = \Phi \left( \frac{\text{ln}(x) - \mu}{\sigma} \right)$	$\text{ln}(x) = \mu + \sigma \Phi^{-1}(p)$
Exponential	scale: $\theta$	$f(x) = \frac{ 1 }{ \theta } e^{-\frac{x}{\theta}}, \; x > 0, \; \theta > 0$	$P(x) = 1- e^{ -\frac{x}{\theta} }$	$x = \theta ( - \text{ln}(1-p) )$
Smallest Extreme Value	location: $\alpha$ scale: $\beta$	$f(x) = \frac{1}{\beta } e^{ \frac{x- \alpha}{\beta}} e^{ - e^{ \frac{x- \alpha}{\beta} } },$ $\beta > 0$	$P(x) = 1- e^{-e^{\frac{x-\alpha}{\beta}}}$	$x = \alpha + \beta \text{ln}( - \text{ln}(1-p) )$
Weibull	scale: $a$ shape: $b$	$f(x) = \frac{b}{a} \left(\frac{ x }{ a }\right)^{b - 1} e^{ -(\frac{ x }{ a })^b },$ $x > 0, \; a > 0, \; b > 0$	$P(x) = 1 - e^{ -(\frac{ x }{ a })^b }$	$\text{ln}(x) = \text{ln}(a) + \frac{1}{b} \text{ln}( - \text{ln}(1-p) )$
Logistic	location: $\mu$ scale: $\sigma$	$f(x) = \frac{e^{\frac{x- \mu}{\sigma}}}{\sigma(1+e^{\frac{x- \mu}{\sigma}})^2 },\sigma > 0$	$P(x) = \frac{1}{1+e^{-\frac{x- \mu}{\sigma}}}$	$x =\mu+ \sigma\text{ln}(\frac{p}{1-p})$
Loglogistic	location: $\mu$ scale: $\sigma$	$\begin{array}{l}f(x) = \frac{e^{\frac{\text{ln}(x)- \mu}{\sigma}}}{x\sigma(1+e^{\frac{\text{ln}(x)- \mu}{\sigma}})^2 } , \\ x>0, \sigma > 0\end{array}$	$P(x) = \frac{1}{1+e^{-\frac{\text{ln}(x)- \mu}{\sigma}}}$	$x =e^{\mu+ \sigma\text{ln}(\frac{p}{1-p})}$

where Φ is the CDF function for the standard normal distribution.

Hazard Function

The hazard function is calculated as follows:

$h(x) = \frac{f(x)}{1 - P(x)}$

where $f(x)$ is the PDF function and $P(x)$ is the CDF function.

Estimate Parameter Standard Errors

Once parameters are estimated by MLE method, the Fisher information matrix (FIM) can be calculated by Hessian matrix:

$[\mathcal{I}(\theta)]_{i,j} = - E\left[ \frac{\partial^2}{\partial \theta_i \, \partial \theta_j} \text{ln}\left( f(Y; \theta) \right) \, \Big| \, \theta \right]$

Covariance matrix of parameters can be expressed as:

$C = (n \mathcal{I} )^{-1}$

where n is the number of points.

Parameter's standard error is the square root of the diagonal elements in the covariance matrix C.

Distribution	Parameters	Lower Confidence Limit	Upper Confidence Limit
Normal, Lognormal, Smallest Extreme Value, Logistic, Loglogistic	location: $\mu$	$\hat{\mu} - Z_{\alpha}{SE_{\hat{\mu}}}$	$\hat{\mu} + Z_{\alpha}{SE_{\hat{\mu}}}$
	scale: $\sigma$	$\frac{\hat{\sigma}}{\exp\!\left(\dfrac{Z_{\alpha}\,SE_{\hat{\sigma}}}{\hat{\sigma}}\right)}$	$\hat{\sigma}{\exp\!\left( \, \frac{Z_{\alpha}SE_{\hat{\sigma}}}{\hat{\sigma}} \right)}$
Exponential	scale: $\theta$	$\frac{ \hat{\theta} }{ \exp\!\left( \, \frac{Z_{\alpha}SE_{\hat{\theta}}}{\hat{\theta}} \right) }$	$\hat{\theta}{\exp\!\left( \, \frac{Z_{\alpha}SE_{\hat{\theta}}}{\hat{\theta}} \right)}$
Weibull	scale: $a$	$\frac{\hat{a}}{\exp\!\left(\dfrac{Z_{\alpha}\,SE_{\hat{a}}}{\hat{a}}\right)}$	$\hat{a}{\exp\!\left( \, \frac{Z_{\alpha}SE_{\hat{a}}}{\hat{a}} \right)}$
Weibull	shape: $b$	$\frac{\hat{b}}{\exp\!\left(\dfrac{Z_{\alpha}\,SE_{\hat{b}}}{\hat{b}}\right)}$	$\hat{b}{\exp\!\left( \, \frac{Z_{\alpha}SE_{\hat{b}}}{\hat{b}} \right)}$

where $Z_\alpha = Z_{1-\alpha/2} = \Phi^{-1}(1 - \tfrac{\alpha}{2})$

Quantities of Distribution

Once parameters are estimated using either the MLE or LS method, quantities of interest can be calculated

Mean
Standard Deviation
Q1, Median, Q3
IQR = Q3 - Q1

Formulas to calculate the distribution Mean

Distribution	Mean	Lower Confidence Level	Upper Confidence Level
Normal	$\hat{\mu}$	$\hat{\mu} - Z_{\alpha}SE_{\hat{\mu}}$	$\hat{\mu} + Z_{\alpha}SE_{\hat{\mu}}$
Lognormal	$\exp(\mu + \frac{1}{2}\sigma^2)$	$\hat{(Mean)} / {\exp}(Z_{\alpha}{SE_{\hat{(Mean)}}})$	$\hat{(Mean)}{\exp}(Z_{\alpha}{SE_{\hat{(Mean)}}})$
Exponential	$\hat{\theta}$	$\hat{\theta} / \exp({\frac{Z_{\alpha} SE_{\hat{\theta}}}{\hat{\theta}}})$	$\hat{\theta} \exp({\frac{Z_{\alpha} SE_{\hat{\theta}}}{\hat{\theta}}})$
Smallest Extreme Value	$\hat{\mu} - EC\,\hat{\sigma}$	$\hat{(Mean)} - Z_{\alpha}SE_{\hat{(Mean)}}$	$\hat{(Mean)} + Z_{\alpha}SE_{\hat{(Mean)}}$
Weibull	$\hat{a}\,\Gamma\!\left(1+\tfrac{1}{\hat{b}}\right)$	$\exp\!\left( \log(\hat{\text{Mean}}) - Z_{\alpha}\, SE_{\log(\hat{\text{Mean}})} \right)$	$\exp\!\left( \log(\hat{\text{Mean}}) + Z_{\alpha}\, SE_{\log(\hat{\text{Mean}})} \right)$
Logistic	$\hat{\mu}$	$\hat{\mu} - Z_{\alpha}SE_{\hat{\mu}}$	$\hat{\mu} + Z_{\alpha}SE_{\hat{\mu}}$
Loglogistic	$\exp(\hat{\mu})\Gamma(1+\hat{\sigma})\Gamma(1-\hat{\sigma})$	$\hat{(Mean)} / {\exp}(Z_{\alpha}{SE_{\hat{(Mean)}}})$	$\hat{(Mean)}{\exp}(Z_{\alpha}{SE_{\hat{(Mean)}}})$

where $EC$ is Euler's constant and $Z_\alpha = Z_{1-\alpha/2} = \Phi^{-1}(1 - \tfrac{\alpha}{2})$

Formulas to calculate the distribution Standard Deviation

Distribution	Standard Deviation	Lower Confidence Level	Upper Confidence Level
Normal	$\hat{\sigma}$	$\hat{\sigma}/\exp\!\left( \,\dfrac{Z_{\alpha}}{\hat{\sigma}SE_{\hat{\sigma}}} \right)$	$\hat{\sigma}\exp\!\left( \,\dfrac{Z_{\alpha}}{\hat{\sigma}SE_{\hat{\sigma}}} \right)$
Lognormal	$\sqrt{ \exp\left( \hat{\mu} + \tfrac{1}{2}\hat{\sigma}^{2} \right) \left[ \exp\left( \hat{\sigma}^{2} \right) - 1 \right] }$	$\hat{(SD)} / {\exp}(Z_{\alpha}{SE_{\hat{(SD)}}})$	$\hat{(SD)}{\exp}(Z_{\alpha}{SE_{\hat{(SD)}}})$
Exponential	$\hat{\theta}$	$\hat{\theta} / \exp({\frac{Z_{\alpha} SE_{\hat{\theta}}}{\hat{\theta}}})$	$\hat{\theta} \exp({\frac{Z_{\alpha} SE_{\hat{\theta}}}{\hat{\theta}}})$
Smallest Extreme Value	$\hat{\sigma}\sqrt{\tfrac{\pi^{2}}{6}}$	$\hat{(SD)}/\exp\!\left( \,\dfrac{Z_{\alpha}}{\hat{(SD)}SE_{\hat{(SD)}}} \right)$	$\hat{(SD)}\exp\!\left( \,\dfrac{Z_{\alpha}}{\hat{(SD)}SE_{\hat{(SD)}}} \right)$
Weibull	$\hat{a}\sqrt{\!\left[\Gamma\!\left(1+\tfrac{2}{\hat{b}}\right)-\Gamma^{2}\!\left(1+\tfrac{1}{\hat{b}}\right)\right]}$	$\exp\!\Big(\log \hat{\sigma} - Z_{\alpha}\,SE_{\log \hat{\sigma}}\Big)$	$\exp\!\Big(\log \hat{\sigma} + Z_{\alpha}\,SE_{\log \hat{\sigma}}\Big)$
Logistic	$\hat{\sigma}\sqrt{\tfrac{\pi^{2}}{3}}$	$\hat{(SD)}/\exp\!\left( \,\dfrac{Z_{\alpha}}{\hat{(SD)}SE_{\hat{(SD)}}} \right)$	$\hat{(SD)}\exp\!\left( \,\dfrac{Z_{\alpha}}{\hat{(SD)}SE_{\hat{(SD)}}} \right)$
Loglogistic	$\sqrt{\,\exp(2\hat{\mu})\Big[\Gamma(1+2\hat{\sigma})\,\Gamma(1-2\hat{\sigma}) - \Gamma^2(1+\hat{\sigma})\,\Gamma^2(1-\hat{\sigma})\Big]\,}$	$\hat{(SD)} / {\exp}(Z_{\alpha}{SE_{\hat{(SD)}}})$	$\hat{(SD)}{\exp}(Z_{\alpha}{SE_{\hat{(SD)}}})$

where $Z_\alpha = Z_{1-\alpha/2} = \Phi^{-1}(1 - \tfrac{\alpha}{2})$

Percentiles

Calculate the time at which a prespecified percent of the population fails.

Distribution	Percentile ( $\hat{x_p}$ )	Variance $\left( \text{Var} \left( \hat{x_p} \right ) \right )$
Normal	$\hat{\mu} + Z_p\hat{\sigma}$	$\text{Var}(\hat{\mu}) + 2Z_{p} \text{Cov}(\hat{\mu}, \hat{\sigma}) + Z_{p}^{2} \text{Var}(\hat{\sigma})$
Lognormal	$\exp(\hat{\mu} + Z_p\hat{\sigma})$	${\hat{x_p}}^2(\text{Var}(\hat{\mu}) + 2Z_{p} \text{Cov}(\hat{\mu}, \hat{\sigma})) + Z_{p}^{2} \text{Var}(\hat{\sigma}))$
Exponential	$-\ln(1-p)\hat{\theta}$	$(-\ln(1-p))^2 \text{Var}(\hat{\theta})$
Smallest Extreme Value	$\hat{\mu} + Z_p\hat{\sigma}$	$\text{Var}(\hat{\mu}) + 2z_p\, \text{Cov}(\hat{\mu},\hat{\sigma}) + z_p^2\, \text{Var}(\hat{\sigma})$
Weibull	$\hat{a} \,\big[-\ln(1-p)\big]^{1/\hat{b}}$	$\left(\frac{\hat{x}_p^{2}}{\hat{a}^{2}}\right)\, \text{Var}(\hat{a}) \;-\; 2\left(\frac{Z_{p}\,\hat{x}_p^{2}}{\hat{a}\,\hat{b}^2}\right)\, \text{Cov}(\hat{a},\hat{b}) \;+\; \left(\frac{\hat{x}_p^{2}}{\hat{b}^{4}}\right) Z_{p}^{2}\, \text{Var}(\hat{b})$
Logistic	$\hat{\mu} + z_p\hat{\sigma}\$	$\text{Var} (\hat{\mu}) + 2z_p\, \text{Cov}(\hat{\mu},\hat{\sigma}) + z_p^2\, \text{Var}(\hat{\sigma})$
Loglogistic	$\exp\!\big(\hat{\mu} + Z_p\hat{\sigma}\big)$	$\hat{x}_p^{\,2}\!\left[ \text{Var}(\hat{\mu}) + 2z_p\, \text{Cov}(\hat{\mu},\hat{\sigma}) + z_p^2\, \text{Var}(\hat{\sigma}) \right]$

For Normal, Smallest Extreme Value, and Logistic distributions:

Lower confidence limit: $\hat{x}_p - Z_{\alpha}\sqrt{ \text{Var}(\hat{x}_p)}$

Upper confidence limit: $\hat{x}_p + Z_{\alpha}\sqrt{ \text{Var}(\hat{x}_p)}$

For Lognormal, Exponential, Weibull, Loglogistic distributions:

Lower confidence limit: $\exp\!\left( \ln(\hat{x}_p) \;-\; Z_{\alpha}\,\frac{\sqrt{\, \text{Var}(\hat{x}_p)\,}}{\hat{x}_p} \right)$

Upper confidence limit: $\exp\!\left( \ln(\hat{x}_p) \;+\; Z_{\alpha}\,\frac{\sqrt{\, \text{Var}(\hat{x}_p)\,}}{\hat{x}_p} \right)$

Survival Probabilities

Calculate the percent of the population that has not yet failed at a pre-specified time point.

Distribution	Survival Probability ( $\hat{z}$ )	Variance of Survival Probability ( $Var(\hat{z})$ )
Normal	$\frac{x - \hat{\mu}}{\hat{\sigma}}$	$\frac{Var(\hat{\mu}) + 2\hat{z} Cov(\hat{\mu}, \hat{\sigma}) + \hat{z}^{2} Var(\hat{\sigma})}{\sigma^2}$
Lognormal	$\frac{ln(x) - \hat{\mu}}{\hat{\sigma}}$	$\frac{Var(\hat{\mu}) + 2\hat{z} Cov(\hat{\mu}, \hat{\sigma}) + \hat{z}^{2} Var(\hat{\sigma})}{\sigma^2}$
Exponential	$ln(x) - ln(\hat{\theta})$	$\frac{Var(\hat{\theta})}{\hat{\theta}^2}$
Smallest Extreme Value	$\frac{x - \hat{\mu}}{\hat{\sigma}}$	$\frac{Var(\hat{\mu}) + 2\hat{z} Cov(\hat{\mu}, \hat{\sigma}) + \hat{z}^{2} Var(\hat{\sigma})}{\sigma^2}$
Weibull	$\hat{b}\,(\ln x - \ln a)$	$\frac{\hat{z}^{2}\,Var(\hat{b})}{\hat{b}^{2}} \;-\; 2\,\frac{\hat{z}\,Cov(\hat{a},\hat{b})}{\hat{a}} \;+\; \frac{\hat{b}^{2}\,Var(\hat{a})}{\hat{a}^{2}}$
Logistic	$\frac{x - \hat{\mu}}{\hat{\sigma}}$	$\frac{Var(\hat{\mu}) + 2\hat{z} Cov(\hat{\mu}, \hat{\sigma}) + \hat{z}^{2} Var(\hat{\sigma})}{\sigma^2}$
Loglogistic	$\frac{ln(x) - \hat{\mu}}{\hat{\sigma}}$	$\frac{Var(\hat{\mu}) + 2\hat{z} Cov(\hat{\mu}, \hat{\sigma}) + \hat{z}^{2} Var(\hat{\sigma})}{\sigma^2}$

Lower bound = $\hat{z} - Z_{\alpha}\sqrt{Var(\hat{z})}$

Upper bound = $\hat{z} + Z_{\alpha}\sqrt{Var(\hat{z})}$

For Normal and Lognormal distributions:

Lower confidence limit: $1 - \frac{1}{2}\left(1 + \mathrm{erf}\left(\frac{UB}{\sqrt{2}}\right)\right)$

Upper confidence limit: $1 - \frac{1}{2}\left(1 + \mathrm{erf}\left(\frac{LB}{\sqrt{2}}\right)\right)$

For Exponential, Smallest Extreme Value, and Weibull distributions:

Lower confidence limit: $\exp(-\exp(UB))$

Upper confidence limit: $\exp(-\exp(LB))$

For Logistic and Loglogistic distributions:

Lower confidence limit: $\frac{1}{1 + \exp(UB)}$

Upper confidence limit: $\frac{1}{1 + \exp(LB)}$

Probability Plot

Points in the probability plot are sorted in ascending order, for the ith point, its probability is calculated by the median rank (Benard's method):

$P = \frac{i - 0.3}{n + 0.4}$

The middle line is the expected percentiles for given probabilities in terms of the inverse cumulative distribution function using parameters from the MLE or LS method.

To estimate confidence limits of percentiles, variance of percentiles is calculated by propagation of error:

$\sigma^2_{x_p} = \left( \frac{\partial x_p}{\partial \theta} \right)^T C \left( \frac{\partial x_p}{\partial \theta} \right)$

where x_p is the percentile for a given probability p, and (.)^T denotes the transpose of a vector.

Confidence limits of percentiles can be calculated as below:

$x_{pL} = x_p - Z_{\alpha} \sigma_{x_p}$

$x_{pU} = x_p + Z_{\alpha} \sigma_{x_p}$

where $Z_{\alpha} = \Phi^{-1}(1- \alpha / 2)$ .

For some positive random variables, e.g. in Lognormal, Exponential, Weibull, and Loglogistic distributions, confidence limits are:

$x_{pL} = e^{ \text{ln}(x_p) - Z_{\alpha} \frac{\sigma_{x_p}}{x_p} }$

$x_{pU} = e^{ \text{ln}(x_p) + Z_{\alpha} \frac{\sigma_{x_p}}{x_p} }$

The fitted line is constructed as follows for each distribution:

Distribution	x-axis	y-axis
Normal	$t_p$	$\phi^{-1}(p)$
Lognormal	$ln(t_p)$	$\phi^{-1}(p)$
Exponential	$ln(t_p)$	$ln(-ln(1-p))$
Smallest Extreme Value	$t_p$	$ln(-ln(1-p))$
Weibull	$ln(t_p)$	$ln(-ln(1-p))$
Logistic	$t_p$	$ln(\frac{p}{1-p})$
Loglogistic	$ln(t_p)$	$ln(\frac{p}{1-p})$

where Φ is the CDF function for the standard normal distribution.

The probability plot is another tool that can be used to assess fit. The probability plot shows an approximately straight line if the assumed distribution is a good fit.

Goodness of Fit

Anderson-Darling Test

The hypotheses for the Anderson-Darling test are:

H₀: Input data follows the distribution.

H₁: Input data doesn't follow the distribution.

Anderson-Darling Statistic

Input data are sorted first. The CDF function is used to calculate the probability Z_i for each point, and $F_n(Z_i)$ is the probability for the ith point calculated by Benard method or Kaplan-Meier method. The Anderson-Darling statistic A² is calculated using two methods:

Benard method

$A^2 = n \sum_{i=1}^{n+1} \Bigl[ \text{PartA}_i + \text{PartB}_i + \text{PartC}_i \Bigr]$ ,

where $\text{PartA}_i = (Z_{i-1} - Z_i) \; + \; \ln\! \left(\frac{1 - Z_{i-1}}{1 - Z_i} \right)$ , $\text{PartB}_i = 2 F_n(Z_{i-1}) \, \ln \! \left(\dfrac{1 - Z_i}{1 - Z_{i-1}} \right)$ ,

$\text{PartC}_i = F_n(Z_{i-1})^2 \, \ln \! \left(\dfrac{Z_i(1 - Z_{i-1})}{Z_{i-1}(1 - Z_i)}\right)$ , $Z_0 = 0, \, F_n(Z_0) = 0, \, \ln(Z_0) = 0, Z_{n+1} = 1 - 10^{-12}$ .
Kaplan-Meier method

$A^2 = n \Bigl [ -1 - \ln(Z_n) -\ln(1 - Z_n) + \sum_{i=2}^n \bigl( \text{PartB}_i + \text{PartC}_i \bigr) \; \Bigr ]$ ,

When there is no right censoring in maximum likelihood estimation method,

$A^2 = -n -(1/n) \sum_{i=1}^n \Bigl[ (2i-1) \text{ln} (Z_i) + ( 2n+1-i ) \text{ln} (1 - Z_i) \Bigr]$

A² value can be used to assess the fit. The smaller the value is, the better the model will be.

P-value can be calculated for the Anderson-Darling Test using the Kaplan-Meier method

P-value is calculated using a bootstrap resampling approach. And the value can be compared with other software. Events are drawn from a distribution using the estimated parameters. Censoring indicators from the input data are retained. If P-value is less than the critical value, e.g. 0.05, H₀ will be rejected.

The P-value is the proportion of bootstrap samples whose A² statistic values are greater than or equal to input data's [4]. i.e.

$\text{P-value} = \frac{b + 1}{N + 1}$ ,

where b is the number of test statistic values that are greater than or equal to input data's test statistic, and N is the number of bootstrap samples.

When there is no right censoring, we use another method [2] to calculate the P-value like Statistical Process Control app.

Pearson Correlation Coefficient

The Pearson correlation coefficient is calculated under the Least Squares estimation method. It measures how well the straight line fits the input data. A high value indicates the least squares regression lines explains most of the variation in Y through X.

Reference

R.A. Lockhart and M.A. Stephens (1994). "Estimation and Tests of Fit for the Three-parameter Weibull Distribution". Journal of the Royal Statistical Society, Vol. 56, No. 3, pp. 491-500.
Ralph B. D'Agostino, Michael A. Stephens (Eds.) (1986). Goodness-of-Fit Techniques. New York: Marcel Dekker
Hartigan J A (1975). Clustering Algorithms. Wiley
W. Stute, W. G. Manteiga, and M. P. Quindimil (1993). “Bootstrap based goodness-of-fit-tests”. Metrika, 40.1: pp. 243-256.
W. Q. Meeker, L. A. Escobar, and F. G. Pascual (2021). Statistical Methods for Reliability Data, 2nd Edition. New York: John Wiley & Sons.