2.13.3.3 hcluster(Pro)

1 Menu Information
2 Brief Information
3 Additional Information
4 Command Line Usage
5 X-Function Execution Options
6 Variables
7 Description
8 Examples
9 Algorithm
10 References
11 Related X-Functions

Menu Information

Statistics: Multivariate Analysis: Hierarchical Cluster Analysis

Brief Information

Perform hierarchical cluster analysis

Additional Information

This feature is for OriginPro only.

Minimum Origin Version Required:8.6

Command Line Usage


 hcluster irng:=2:5 label:=1 number:=2;
 hcluster irng:=4:15 obj:=1 number:=3;

X-Function Execution Options

Please refer to the page for additional option switches when accessing the x-function from script

Variables

Display Name	Variable Name	I/O and Type	Default Value	Description
Variables	irng	Input Range	<active>	Select data range for the hierarchical cluster analysis. Note that beginning with Origin 2020b, there is a shortened syntax that follows the form [Book]Sheet!(N1:N2), N1 = the beginning column index and N2 being the ending column index in a contiguous range of columns. More complex strings from non-contiguous data of the form [Book]Sheet!([Book]Sheet!N1:N2,[Book]Sheet!N3:N4) are also possible.
Observation Labels	label	Input Range	<optional>	Select labels for observations. If labels are chosen, they will be shown as ticks of X axis in the dendrogram. This option is enabled only when obj is Observations.
Cluster	obj	Input int	0	Specify the type of objects to cluster. Option list: Observations Cluster observations. Variables Cluster variables.
Cluster Method	link	Input int	2	Select the linkage method to calculate the distance between a cluster and a new cluster. Values start from 0, but string values (such as near) are recommended for clarity. Option list: near:Nearest neighbor The minimum of two distances between a cluster and two clusters merged to a new cluster. Also called single linkage. furth:Furthest neighbor The maximum of distances between a cluster and two clusters merged to a new cluster. Also called complete linkage. group:Group average The mean of two distances between a cluster and two clusters merged to a new cluster. centroid:Centroid Clusters are produced that maximize the distance between the centers of clusters. median:Median The median distance between an item in one cluster and an item in the other cluster. ward:Ward Clusters are produced that minimize the within-cluster variance. To learn more about linkage methods, see the algorithm of linkage methods.
Distance Type	dist1	Input int	0	Select a distance type in the hierarchical cluster analysis when obj is Observations. Values start from 0, but string values (such as euc) are recommended for clarity. Option list: euc:Euclidean The square root of the sum of the squared differences between two observations. squ:Squared Euclidean The sum of the squared differences between two observations. city:City block The sum of the absolute differences between two observations. Also known as Manhattan distance.
Distance Type	dist2	Input int	0	Select a distance type in the hierarchical cluster analysis when obj is Variables. Values start from 0, but string values (such as corr) are recommended for clarity. Option list: corr:Correlation The difference between 1 and the correlation of two variables. abs:Absolute correlation The difference between 1 and the absolute correlation of two variables.
Standardize Variables	std	Input int	0	Specify the method to standardize variables. It is available only when obj is Observations. Values start from 0, but string values (such as snd) are recommended for clarity. Option list: none:None Variables are not standardized. snd:Z scores (standardize to N(0, 1)) Variables are transformed to the standard normal distribution. range:Normalize to (0, 1) Variable are transformed to the range of 0 and 1
Number of Clusters	number	Input int	1	Specify the number of clusters.
Find Clustroid by	stat	Input int	0	Specify the method to find the clustroid: the most/least representative variable/observation. Option list: sd:Sum of distances Find Clustroid using the sum of distances measured from all other observations/variables in the cluster. md:Maximum distance Find Clustroid using the Maximum distance among all distances measured from other observations/variables in the cluster. ssd:Sum of squares of distances Find Clustroid using the sum of the squares of distances measured from all other observations/variables in the cluster.
Dissimilarity Matrix	dissimilarity	Input int	0	Specify whether to output the distance matrix. For a large number of objects, the distance matrix will be shown in a sheet instead of the report. 1 = Yes, 0 = No.
Cluster Stages	stage	Input int	1	Specify whether to output the cluster stages. 1 = Yes, 0 = No.
Cluster Center	center	Input int	0	Specify whether to calculate cluster centers. It is available only when obj is Observations. 1 = Yes, 0 = No.
Distance between Cluster Centers	distc2c	Input int	0	Specify whether to calculate the distances between cluster centers. It is available only when obj is Observations. 1 = Yes, 0 = No.
Distance between Observations and Clusters	disto2c	Input int	0	Specify whether to calculate the distance between each observation and cluster centers. It is available only when obj is Observations. 1 = Yes, 0 = No.
Dendrogram	dendrogram	Input int	1	Specify whether to show the dendrogram. 1 = Yes, 0 = No.
Show Dendrogram	ngraph	Input int	0	Specify whether to show the dendrogram in a single graph or in separate graphs for clusters. It is enabled only when dendrogram is 1. Values start from 0. Option list: Show the dendrogram in a single graph. Different clusters are shown in different colors. Show the dendrogram in separate graphs for clusters. Each graph represents a cluster.
Orientation	orient	Input int	0	Specify the orientation of the dendrogram. Enabled only when dendrogram is Yes. Option List: 0: Vertical Plot Dendrogram vertically. 1: Horizontal Plot Dendrogram horizontally. 2: Circular Plot circular Dendrogram
Cluster Report	rt	Output ReportTree	<new>	Specify the sheet for the hierarchical cluster analysis report.
Cluster Membership	rd	Output ReportData	<new>	Specify the sheet for cluster membership and distance between observations and clusters.
Distance Matrix	rddist	Output ReportData	<new>	Specify the sheet for distance matrix when number of objects to cluster is very large. This variable is hidden in the dialog.
Plot Data	rdplot	Output ReportData	<new>	Specify the sheet for plot data. This variable is hidden in the dialog.
Clustroid Info	clustroid	Input int	1	Specify the method to find the Clustroid Info: the most/least representative variable/observation

Description

This function performs the Hierarchical Cluster Analysis on range data. For more information, see the Cluster Analysis.

Examples

Import the data file \Samples\Graphing\US Mean Temperature.dat.
Run the script.

hcluster irng:=4[1]:15[100] number:=5 rd:=[<input>]<input> -r 2;

Algorithm

See the algorithm of Hierarchical Cluster Analysis.

References

See the reference of Cluster Analysis.

Related X-Functions

pca, kmeans, discrim

Skip Navigation Links

X-Functions for Multivariate Analysis

English | Deutsch | 日本語