17.2.7 2D Kernel Density


The 2D Kernel Density plot is a smoothed color density representation of the scatterplot, based on kernel density estimation, a nonparametric technique for probability density functions. The goal of density estimation is to take a finite sample of data and to infer the underyling probability density function everywhere, including where no data point are presented. In kernel density estimation, the contribution of each data point is smoothed out from a single point into a region of vicinity. These smoothed density plot shows an average trend for the scatter plot.

Creating 2D Kernel Density Plot

To create a 2D Kernel Density plot:

  1. Highlight one Y column.
  2. Open 2D Kernel Density plot dialog by clicking Plot > Contour: 2D Kernel Density.
  3. In the plot_kde2 dialog box, specify the Method , Number of Grid Points in X/Y and the Number of Points to Display, and Plot Type.
  4. Click OK to create a 2D Kernel Density plot.
    2D-Kernel.png

The Dialog of plot_kde2

2D Kernel Density dialog.png

Input Data

Specify the input data.

Settings

Bandwidth Method
Specify the bandwidth calculation method of the 2D Kernel Density plot.
  • Bivariate Kernel Density Estimator
  • Rules of Thumb
Density Method
Specify a method to calculate the kernel density for defined XY grids.
  • Exact Estimation
Choose the option to calculate density values according to the Ks2density equation. For a large dataset, computation of the exact computation may require extensive calculation,
  • Binned Approximate Estimation
Choose the option to calculate approximation of density values. This option is recommended for a large sample.
Number of Points to Display
Specify the first N lowest density points to be superimposed on the density image.
Interpolate Density Points
Specify the calculation method to decide which points to superimposed on the density image (see details in below Algorithm section). Usually if the number of source data is large (ie. >50000), we strongly recommend to select this option to improve the speed.
Number of Grid Points in X/Y
Specify the number of equally spaced grid points for the density estimation.
Number of Points to Display
Specify the first N lowest density points to be superimposed on the density image when the checkbox of All is unchecked. Otherwise, it will display all points when the All checkbox is selected by default.
Grid Range
As an interim step, a matrix of gridded values is generated from the X/Y data and the kernel density plot is created from the matrix values. By default, the Grid Range registers the minimum and maximum X and Y values in that matrix. Clear the Auto box to enter a value manually.
  • X Minimum
  • X Maximum
  • Y Minimum
  • Y Maximum
Plot Type
Specify the plot type.
  • Contour
Use the density matrix to plot contour
  • Image
Use the density matrix to make an image plot

Density Estimation data

This determines where the calculated data for the graph is stored.

Displaying Data

This determines where the data of the displayed scatter plot is stored. Only available when Number of Points to Display is not 0.

Algorithm

Kernel density estimation is a nonparametric technique to estimate density of scatter points. The goal of density estimation is to estimate underlying probability density function everywhere, including where no data are observed, from the existing scatter points. A kernel function is created with the datum at its center – this ensures that the kernel is symmetric about the datum. Kernel density estimation smooths the contribution of data points to give overall picture of the density of data points.

Density grids.png

Density Calculation Method

Specify a method to calculate the kernel density for defined xy grids.

Exact Estimation

Density values are calculated based on the equation below

f(x,y,\text{vX},\text{vY},w_x,w_y) = \frac{1}{n} \sum_{i=1}^{n} \frac{1}{ 2\pi w_x w_y }  \exp \left(-\frac{(x-\text{vX}_i)^2}{2w_x ^2} - \frac{(y-\text{vY}_i)^2}{2w_y^2} \right)

where n is the number of elements in vector vX or vY, \text{vX}_i is ith element in vector vX and \text{vY}_i is ith element in vector vY. w_x and w_y is the optimal bandwidths values.

Binned Approximate Estimation

Speed up the density calculation by an approximation to the exact estimation of 2D kernel density.

First 2D binning is performed on the (x, y) points to obtain a matrix with the bin counts. Then 2D Fast Fourier Transform is utilized to perform discrete convolutions for calculating density values of each grid.

4th root of density values is calculated to map the density scale to the color scale

Bandwidth Methods

Bivariate Kernel Density Estimator

Calculate bandwidth based on linear diffusion process.

Rule of Thumb

The estimation of wx and wy simply can be calculated by:

w_x = \frac{\sigma_x}{2n^{1/6}}
w_y = \frac{\sigma_y}{2n^{1/6}}

where n is the size of vector vX or vY, \sigma_x is the sample standard variation for dataset vX, and \sigma_y for dataset vY accordingly.


Interpolate Density Points

Specify the calculation method to decide which points to superimpose on the density image.

If the option is selected, kernel density of points are calculated by the interpolation on the density matrix for defined XY grids. If number of source data is very large, selecting the option can greatly improve the speed.

If the option is not selected, the density values will be calculated by the Exact Estimation method.