2.52 Optimal Cluster Number (Pro)

Summary

This Optimal Cluster Number app can be used to determine the optimal number of clusters, which is an important parameter in cluster analysis..

The App requires Embedded Python, Python library scikit-learn and gap-stat,gapstat_rs, and other dependent libraries include pytz, six, python-dateutil, numpy, pandas, matplotlib, and scipy. The library and other dependent Python libraries will be automatically downloaded and installed when the App is installed. Wait a few minutes until the installation is completed and restart Origin.

Tutorial

This tutorial uses App’s built-in sample project. To open this sample OPJU file:

  1. Right click the Optimal Cluster Number App icon in the Apps Gallery and choose Show Samples Folder.
  2. A folder will open. Drag-and-drop the project file clusterNum_sample.opju from the folder onto Origin.
Notes: If you want to save the project after changing, it is recommended saving to a different folder location (e.g. User Files Folder).

Steps

  1. [Book1]Sheet1 contains the cluster data from which we want to determine the number of cluster K. If we plot a Scatter graph, Graph1, we can tell from it that there may be 3 or 4 clusters.
    Click Optimal Cluster Number 1.png
  2. Highlight column A and B. Click Optimal Cluster Number App icon in the Apps Gallery to open the dialog.
  3. Set Start of Range of K to 2 and End of Range of K to 8.
  4. Select Elbow Method and Silhouette Method checkboxes. Refer to File Exchange page for details of 3 methods.
    Click Optimal Cluster Number 2.png
  5. Click OK to output results. From the report sheet, we can see both Elbow Method and Silhouette Method indicate the optimal cluster number is 4.
    Click Optimal Cluster Number 3.png