# 2.13.3.3 hcluster(Pro)

Statistics: Multivariate Analysis: Hierarchical Cluster Analysis

## Brief Information

Perform hierarchical cluster analysis

This feature is for OriginPro only.

Minimum Origin Version Required:8.6

## Command Line Usage

 

 hcluster irng:=2:5 label:=1 number:=2; hcluster irng:=4:15 obj:=1 number:=3; 

## Variables

Display
Name
Variable
Name
I/O
and
Type
Default
Value
Description
Variables irng

Input

Range

<active>
Select data range for the hierarchical cluster analysis. Note that beginning with Origin 2020b, there is a shortened syntax that follows the form [Book]Sheet!(N1:N2), N1 = the beginning column index and N2 being the ending column index in a contiguous range of columns. More complex strings from non-contiguous data of the form [Book]Sheet!([Book]Sheet!N1:N2,[Book]Sheet!N3:N4) are also possible.
Observation Labels label

Input

Range

<optional>
Select labels for observations. If labels are chosen, they will be shown as ticks of X axis in the dendrogram. This option is enabled only when obj is Observations.
Cluster obj

Input

int

0
Specify the type of objects to cluster.

Option list:

• Observations
Cluster observations.
• Variables
Cluster variables.

Input

int

2
Select the linkage method to calculate the distance between a cluster and a new cluster. Values start from 0, but string values (such as near) are recommended for clarity.

Option list:

• near:Nearest neighbor
The minimum of two distances between a cluster and two clusters merged to a new cluster. Also called single linkage.
• furth:Furthest neighbor
The maximum of distances between a cluster and two clusters merged to a new cluster. Also called complete linkage.
• group:Group average
The mean of two distances between a cluster and two clusters merged to a new cluster.
• centroid:Centroid
Clusters are produced that maximize the distance between the centers of clusters.
• median:Median
The median distance between an item in one cluster and an item in the other cluster.
• ward:Ward
Clusters are produced that minimize the within-cluster variance.

Distance Type dist1

Input

int

0
Select a distance type in the hierarchical cluster analysis when obj is Observations. Values start from 0, but string values (such as euc) are recommended for clarity.

Option list:

• euc:Euclidean
The square root of the sum of the squared differences between two observations.
• squ:Squared Euclidean
The sum of the squared differences between two observations.
• city:City block
The sum of the absolute differences between two observations. Also known as Manhattan distance.
Distance Type dist2

Input

int

0
Select a distance type in the hierarchical cluster analysis when obj is Variables. Values start from 0, but string values (such as corr) are recommended for clarity.

Option list:

• corr:Correlation
The difference between 1 and the correlation of two variables.
• abs:Absolute correlation
The difference between 1 and the absolute correlation of two variables.
Standardize Variables std

Input

int

0
Specify the method to standardize variables. It is available only when obj is Observations. Values start from 0, but string values (such as snd) are recommended for clarity.

Option list:

• none:None
Variables are not standardized.
• snd:Z scores (standardize to N(0, 1))
Variables are transformed to the standard normal distribution.
• range:Normalize to (0, 1)
Variable are transformed to the range of 0 and 1
Number of Clusters number

Input

int

1
Specify the number of clusters.
Find Clustroid by stat

Input

int

0
Specify the method to find the clustroid: the most/least representative variable/observation.

Option list:

• sd:Sum of distances
Find Clustroid using the sum of distances measured from all other observations/variables in the cluster.
• md:Maximum distance
Find Clustroid using the Maximum distance among all distances measured from other observations/variables in the cluster.
• ssd:Sum of squares of distances
Find Clustroid using the sum of the squares of distances measured from all other observations/variables in the cluster.
Dissimilarity Matrix dissimilarity

Input

int

0
Specify whether to output the distance matrix. For a large number of objects, the distance matrix will be shown in a sheet instead of the report. 1 = Yes, 0 = No.
Cluster Stages stage

Input

int

1
Specify whether to output the cluster stages. 1 = Yes, 0 = No.
Cluster Center center

Input

int

0
Specify whether to calculate cluster centers. It is available only when obj is Observations. 1 = Yes, 0 = No.
Distance between Cluster Centers distc2c

Input

int

0
Specify whether to calculate the distances between cluster centers. It is available only when obj is Observations. 1 = Yes, 0 = No.
Distance between Observations and Clusters disto2c

Input

int

0
Specify whether to calculate the distance between each observation and cluster centers. It is available only when obj is Observations. 1 = Yes, 0 = No.
Dendrogram dendrogram

Input

int

1
Specify whether to show the dendrogram. 1 = Yes, 0 = No.
Show Dendrogram ngraph

Input

int

0
Specify whether to show the dendrogram in a single graph or in separate graphs for clusters. It is enabled only when dendrogram is 1. Values start from 0.

Option list:

• Show the dendrogram in a single graph. Different clusters are shown in different colors.
• Show the dendrogram in separate graphs for clusters. Each graph represents a cluster.
Orientation orient

Input

int

0
Specify the orientation of the dendrogram. Enabled only when dendrogram is Yes.

Option List:

• 0: Vertical
Plot Dendrogram vertically.
• 1: Horizontal
Plot Dendrogram horizontally.
• 2: Circular
Plot circular Dendrogram
Cluster Report rt

Output

ReportTree

<new>
Specify the sheet for the hierarchical cluster analysis report.
Cluster Membership rd

Output

ReportData

<new>
Specify the sheet for cluster membership and distance between observations and clusters.
Distance Matrix rddist

Output

ReportData

<new>
Specify the sheet for distance matrix when number of objects to cluster is very large. This variable is hidden in the dialog.
Plot Data rdplot

Output

ReportData

<new>
Specify the sheet for plot data. This variable is hidden in the dialog.
Clustroid Info clustroid

Input

int

1
Specify the method to find the Clustroid Info: the most/least representative variable/observation

## Description

This function performs the Hierarchical Cluster Analysis on range data. For more information, see the Cluster Analysis.

## Examples

1. Import the data file \Samples\Graphing\US Mean Temperature.dat.
2. Run the script.
hcluster irng:=4[1]:15[100] number:=5 rd:=[<input>]<input> -r 2;