Purpose
This App can be used to determine the optimal number of clusters, which is an important parameter in cluster analysis.
Notes: This App needs Embedded Python and scikit-learn, gap-stat,gapstat_rs library. Other dependent libraries include pytz, six, python-dateutil, numpy, pandas, matplotlib, and scipy.
Installation
- Download the clusterNum.opx file, then drag-and-drop onto the Origin workspace.
- The App will start downloading dependent Python libraries. Wait a few minutes until the download is completed and restart Origin.
Operation
- Activate a worksheet. Click the App icon to bring up the dialog.
- Select multiple columns of data for cluster analysis.
- Enter start and end value for the range of possible number of clusters K.
- Choose methods to determine cluster number:
- Elbow method: The within-cluster sum of squares (WSS) at each number of clusters is calculated and graphed. Increasing number of clusters will first add much information, but at some point the marginal gain will drop, showing a sharp "elbow" in the graph. The optimal number of clusters is chosen at this point.
- Silhouette method: The average silhouette of observations for different number of clusters are computed. The optimal number of clusters k is the one that maximize the average silhouette.
- Gap Statistic: The gap statistic compares the total within intra-cluster variation for different number of clusters k with their expected values under null reference distribution of the data. The estimate of the optimal clusters will be value that maximize the gap statistic.
- Click OK to output results.
Sample OPJU File
This app provides a sample OPJU file. Right click the App icon in the Apps Gallery window, and choose Show Samples Folder from the short-cut menu. A folder will open. Drag-and-drop the project file clusterNum_sample.opju from the folder onto Origin. The Notes window in the project shows detailed steps.
Note: If you wish to save the OPJU after changing, it is recommended that you save to a different folder location (e.g. User Files Folder).