17.2.4 Probability Plot and Q-Q Plot

The probability plot is used to test whether a dataset follows a given distribution. It shows a graph with an observed cumulative percentage on the X axis and an expected cumulative percentage on the Y axis. If all the scatter points are close to the reference line, we can say that the dataset follows the given distribution.

A Q-Q (Quantile-Quantile) plot is another graphic method for testing whether a dataset follows a given distribution. It differs from the probability plot in that it shows observed and expected values instead of percentages on the X and Y axes. If all the scatter points are close to the reference line, we can say that the dataset follows the given distribution.

Origin supports five given distributions (Normal, Lognormal, Exponential, Weibull and Gamma), and five methods for plotting percentile approximations (Blom, Benard, Hazen, Van der Waerden, and Kaplan-Meier).

Creating Probability Plot or Q-Q Plot

To create a probability plot or Q-Q plot:

  1. Highlight one Y column or multiple Y columns as input variable(s).
  2. Open the probability/Q-Q plot dialog:
    For a probability plot: In Origin's main menu, click Plot > Statistical: Probability Plot.... Alternatively, you can click the Probability Plot button Button Probability Plot.png on the 2D Graphs toolbar.
    For a Q-Q plot: In Origin's main menu, click 'Plot > Statistical: Q-Q Plot.... Alternatively, you can click the Q-Q Plot button Button Q Q Plot.png on the 2D Graphs toolbar.
  3. In the plot_prob X-Function dialog, select the grouping column(s), set arrangement of groups and variables, specify the distribution and method.
  4. Click OK to create a probability plot or a Q-Q plot.
    Probability Plot and Q Q Plot 2.png

The Dialog of plot_prob X-Function

Probability plot 03.png

Input Data

Specify the input data. You can select multiple columns as inout variables.

Group

Specify the grouping column(s) in order to seperate the input variable(s) into multiple different plots.

Arrange Plots

Specify how to arrange the plots in different varaibles and groups:

  • Overlay All: Selected by default. All groups and variables in same layer.
  • Overlay Groups, Variables in Different Layers: Different groups overlay in same layer; different variables in different layers
  • Overlay Variables, Groups in Different Layers: Different variables overlay in same layer; different groups in different layers

Distribution

Select a distribution type for your data. For more information about distributions, please refer to Distributions section.

Distribution
Four distributions are available.
  • Normal
  • Lognormal
  • Exponential
  • Weibull
  • Gamma
Estimate from Data
Specify whether to estimate distribution parameters from input data. If not, parameters can be specified manually.
mu
Mean of the normal distribution.
sigma
Standard deviation of Normal distribution.
shape
Shape of the specified distribution. Available in both Lognormal, Weibull and Gamma distributions.
scale
Scale of the specified distribution. Available in Lognormal, Exponential, Weibull and Gamma distributions.

Score Method

Select a method for plotting percentile approximations. For more information about methods, please refer to Score Methods section.

  • Blom
  • Benard
  • Hazen
  • Van der Waerden
  • Kaplan-Meier

Confidence Band

Specify whether to output the confidence band in probability plot. For computation details, see Algorithms.

Confidence Level(%)

Only available when Confidence Band is selected. Specify the confidence level in percentage for the chosen distribution.

Exchange X-Y Axes

Specify whether to switch X and Y axis positions.

X Minimum
X Maximum

Auto values are X Minimum = 1 and X Maximum = 99.5. If Auto is cleared, use the minimum and maximum values of the Reference Line column in the output.

When X Minimum is greater than the Auto value, we calculate Percentile value p1 for the X Minimum, and the Percentile column should only include p1, and values greater than p1, in the default list. If X Maximum is less than the Auto value, we calculate the Percentile value p2 for the X Maximum, and the Percentile column should only include p2, and values less than p2, in the default list.

When X Minimum is less than the Auto value, we calculate the Percentile value p1 for the X Minimum. If p1<1e-5, p1=1e-5, we then find the minimum value 10^(-m) which is larger than p1 and the Percentile column includes p1, 10^(-m), 10^(-m+1), ,,,,1, 2,...

If X Maximum is greater than the Auto value, we calculate the Percentile value p2 for the X Maximum. If p2>99.99, p2=99.99, we then find the maximum value which is less than p2 from the list (99.9, 99.99) and the Percentile column includes 99, 99.5, 99.9,..p2.

Output Range

This determines where the calculated data for the graph is stored.

Distributions

Origin includes four distributions for Probability and Q-Q plots. The following table lists their density functions:

Distribution Density Function p(x) Range Parameters

Normal

\frac 1{\sigma \sqrt{2\pi }}\exp \left( -\frac{\left( x-\mu \right) ^2}{2\sigma ^2}\right)

all x

\mu,mean,is the location parameter
\sigma(>0),standard deviation, is the scale parameter

Lognormal

\frac 1{\sigma x\sqrt{2\pi }}\exp \left( -\frac{\left( \ln \left( x\right) -\mu \right) ^2}{2\sigma ^2}\right)

x>0

\mu is the shape scale parameter
\sigma(>0) is the scale parameter.

Exponential

\frac 1\sigma \exp \left( -\frac x\sigma \right)

x>0 \sigma(>0) is the scale parameter.

Weibull

\frac c\sigma \left( \frac x\sigma \right) ^{c-1}\exp \left( -\left( \frac x\sigma \right) ^c\right)

x>0

\sigma(>0) is the scale parameter
c(>0) is the shape parameter

Gamma

\frac{1}{\Gamma(c)\sigma^c}x^{c -1} exp(-x/\sigma),

x>0

\sigma(>0) is the scale parameter
c(>0) is the shape parameter

Details for Constructing Probability Plot

To construct a probability plot, sort the observed dataset from smallest to largest:

x[1]\le x[2]\le x[3]\le \cdots \le x[n-1]\le x[n], n is the total number of the observed dataset.

The sorted observed values are represented on the plot by points whose X-coordinates are x[i]\ and whose Y-coordinates are calculated using the Score Method.

Probability Plot and Q Q Plot 7.png

Scale types of probability plot are different according to the distributions

Distribution X Scale Type Y Scale Type

Normal

Linear

Probability

Lognormal

Ln

Probability

Exponential

Ln

Double Log Reciprocal

Weibull

Log10

Double Log Reciprocal

Gamma

Log10

Probability

Details for Constructing Q-Q Plot

To construct a Q-Q plot,sort the observed dataset from smallest to largest:

x[1]\le x[2]\le x[3]\le \cdots \le x[n-1]\le x[n], where n is the total number of observed values.

The Y values are the inverse cumulative distribution functions of the score method used.

Probability Plot and Q Q Plot 6.png

Score Methods

Input data is ordered from smallest to largest, and then the serial number of the sorted data is scored using one of the methods listed below. In this table, i is the serial number and n is the total number of the nonmissing input data.

Methods Plotting Position method(i,n)

Blom

(i-0.375)/(n+0.25)

Benard

(i-0.3)/(n+0.4)

Hazen

(i-0.5)/n

Van der Waerden

i/(n+1)

Kaplan-Meier

i/n

Reference

  • Samuel Kotz , Campbell B. Read , N. Balakrishnan, Brani Vidakovic, 2005. Encyclopedia of statistical sciences., NewYork: John Wiley & Sons, Inc.
  • Thode, Henry C. 2002, Testing for Normality, CRC Press