2.5.1.1 Algorithm: Parametric Distribution Analysis (Right Censoring)SA-Parametric-Dist-Algorithm
Data are input in time and censoring indicator pairs .
The MLE (maximum likelihood estimation) and LS (least squares) methods are used to estimate parameters for each distribution.
Under the MLE method:
- First calculate the likelihood function
for the distribution, which is the product of Probability Density Function for failure time data and the Survival Density Function for censored time data.
- Then maximize the logarithm of the likelihood function
by setting its partial derivatives equal to zero with respect to parameters.
- Use Newton-Raphson method to solve these parameters in step 2.
Under the LS method:
- First plot the failure times (or log failure times) against the transformed cumulative probabilities on a probability plot.
- Then fit a straight line by minimizing the sum of squared deviations
- Use the slope and scale from the fitted line to calculate parameter estimates
Distributions
Supported distributions include:
where Φ is the CDF function for the standard normal distribution.
Hazard Function
The hazard function is calculated as follows:
- where
is the PDF function and is the CDF function.
Estimate Parameter Standard Errors
Once parameters are estimated by MLE method, the Fisher information matrix (FIM) can be calculated by Hessian matrix:
Covariance matrix of parameters can be expressed as:
where n is the number of points.
Parameter's standard error is the square root of the diagonal elements in the covariance matrix C.
| Distribution
|
Parameters
|
Lower Confidence Limit
|
Upper Confidence Limit
|
| Normal, Lognormal, Smallest Extreme Value, Logistic, Loglogistic
|
location:
|
|
|
scale:
|
|
|
| Exponential
|
scale:
|
|
|
| Weibull
|
scale:
|
|
|
shape:
|
|
|
where
Quantities of Distribution
Once parameters are estimated using either the MLE or LS method, quantities of interest can be calculated
- Mean
- Standard Deviation
- Q1, Median, Q3
- IQR = Q3 - Q1
Formulas to calculate the distribution Mean
| Distribution
|
Mean
|
Lower Confidence Level
|
Upper Confidence Level
|
| Normal
|
|
|
|
| Lognormal
|
|
|
|
| Exponential
|
|
|
|
| Smallest Extreme Value
|
|
|
|
| Weibull
|
|
|
|
| Logistic
|
|
|
|
| Loglogistic
|
|
|
|
where is Euler's constant and
Formulas to calculate the distribution Standard Deviation
| Distribution
|
Standard Deviation
|
Lower Confidence Level
|
Upper Confidence Level
|
| Normal
|
|
|
|
| Lognormal
|
|
|
|
| Exponential
|
|
|
|
| Smallest Extreme Value
|
|
|
|
| Weibull
|
|
|
|
| Logistic
|
|
|
|
| Loglogistic
|
|
|
|
where
Percentiles
Calculate the time at which a prespecified percent of the population fails.
| Distribution
|
Percentile ( )
|
Variance
|
| Normal
|
|
|
| Lognormal
|
|
|
| Exponential
|
|
|
| Smallest Extreme Value
|
|
|
| Weibull
|
|
|
| Logistic
|
|
|
| Loglogistic
|
|
|
For Normal, Smallest Extreme Value, and Logistic distributions:
Lower confidence limit:
Upper confidence limit:
For Lognormal, Exponential, Weibull, Loglogistic distributions:
Lower confidence limit:
Upper confidence limit:
Survival Probabilities
Calculate the percent of the population that has not yet failed at a pre-specified time point.
| Distribution
|
Survival Probability ( )
|
Variance of Survival Probability ( )
|
| Normal
|
|
|
| Lognormal
|
|
|
| Exponential
|
|
|
| Smallest Extreme Value
|
|
|
| Weibull
|
|
|
| Logistic
|
|
|
| Loglogistic
|
|
|
Lower bound =
Upper bound =
For Normal and Lognormal distributions:
Lower confidence limit:
Upper confidence limit:
For Exponential, Smallest Extreme Value, and Weibull distributions:
Lower confidence limit:
Upper confidence limit:
For Logistic and Loglogistic distributions:
Lower confidence limit:
Upper confidence limit:
Probability Plot
Points in the probability plot are sorted in ascending order, for the ith point, its probability is calculated by the median rank (Benard's method):
The middle line is the expected percentiles for given probabilities in terms of the inverse cumulative distribution function using parameters from the MLE or LS method.
To estimate confidence limits of percentiles, variance of percentiles is calculated by propagation of error:
where xp is the percentile for a given probability p, and (.)T denotes the transpose of a vector.
- Confidence limits of percentiles can be calculated as below:
where .
- For some positive random variables, e.g. in Lognormal, Exponential, Weibull, and Loglogistic distributions, confidence limits are:
The fitted line is constructed as follows for each distribution:
| Distribution
|
x-axis
|
y-axis
|
| Normal
|
|
|
| Lognormal
|
|
|
| Exponential
|
|
|
| Smallest Extreme Value
|
|
|
| Weibull
|
|
|
| Logistic
|
|
|
| Loglogistic
|
|
|
where Φ is the CDF function for the standard normal distribution.
The probability plot is another tool that can be used to assess fit. The probability plot shows an approximately straight line if the assumed distribution is a good fit.
Goodness of Fit
Anderson-Darling Test
The hypotheses for the Anderson-Darling test are:
- H0: Input data follows the distribution.
- H1: Input data doesn't follow the distribution.
- Anderson-Darling Statistic
- Input data are sorted first. The CDF function is used to calculate the probability Zi for each point, and
is the probability for the ith point calculated by Benard method or Kaplan-Meier method. The Anderson-Darling statistic A2 is calculated using two methods:
- Benard method
,
where , ,
-
, .
- Kaplan-Meier method
,
- When there is no right censoring in maximum likelihood estimation method,
A2 value can be used to assess the fit. The smaller the value is, the better the model will be.
- P-value can be calculated for the Anderson-Darling Test using the Kaplan-Meier method
- P-value is calculated using a bootstrap resampling approach. And the value can be compared with other software. Events are drawn from a distribution using the estimated parameters. Censoring indicators from the input data are retained. If P-value is less than the critical value, e.g. 0.05, H0 will be rejected.
- The P-value is the proportion of bootstrap samples whose A2 statistic values are greater than or equal to input data's [4]. i.e.
,
- where b is the number of test statistic values that are greater than or equal to input data's test statistic, and N is the number of bootstrap samples.
- When there is no right censoring, we use another method [2] to calculate the P-value like Statistical Process Control app.
Pearson Correlation Coefficient
The Pearson correlation coefficient is calculated under the Least Squares estimation method. It measures how well the straight line fits the input data. A high value indicates the least squares regression lines explains most of the variation in Y through X.
Reference
- R.A. Lockhart and M.A. Stephens (1994). "Estimation and Tests of Fit for the Three-parameter Weibull Distribution". Journal of the Royal Statistical Society, Vol. 56, No. 3, pp. 491-500.
- Ralph B. D'Agostino, Michael A. Stephens (Eds.) (1986). Goodness-of-Fit Techniques. New York: Marcel Dekker
- Hartigan J A (1975). Clustering Algorithms. Wiley
- W. Stute, W. G. Manteiga, and M. P. Quindimil (1993). “Bootstrap based goodness-of-fit-tests”. Metrika, 40.1: pp. 243-256.
- W. Q. Meeker, L. A. Escobar, and F. G. Pascual (2021). Statistical Methods for Reliability Data, 2nd Edition. New York: John Wiley & Sons.
|