17.7.3.1 The Hierarchical Cluster Analysis Dialog Box

1 Dialog Theme
2 Recalculate
3 Input
4 Settings
5 Quantities
6 Plot
7 Output Settings

Dialog Theme

Load or save Dialog Theme. Additionally, generate script for the X-Function using the current dialog box settings.

Recalculate

Input

Variables	Select data for the Hierarchical Cluster Analysis. Data in each column corresponds to a variable and each row to an observation.
Observation Labels	Select labels for observations. If labels are chosen, they will be shown as X axis ticks in the dendrogram. Enabled only when the objects to cluster are observations. The label column will be set as categorical if Text column.

Settings

Specify the settings for the Hierarchical Cluster Analysis.

Cluster	Specify the type of objects to cluster. Observations Cluster observations. Rows in the input data are classified into groups. Variables Cluster variables. Columns in the input data are classified into groups. Note that for different types of objects to cluster, available distance types are also different.
Cluster Method	Select the linkage method to calculate the distance between a cluster and a new cluster. Six methods are available. Nearest neighbor The minimum of two distances between a cluster and two clusters merged to a new cluster. Also called single linkage. Furthest neighbor The maximum of distances between a cluster and two clusters merged to a new cluster. Also called complete linkage. Group average The mean of two distances between a cluster and two clusters merged to a new cluster. Centroid Clusters are produced that maximize the distance between the centers of clusters. Median The median distance between an item in one cluster and an item in the other cluster. Ward Clusters are produced that minimize the within-cluster variance. To learn more about linkage methods, see the algorithm of linkage methods.
Distance Type	Select a distance type in the Hierarchical Cluster Analysis. For observations to cluster, three methods are available: Euclidean The square root of the sum of the squared differences between two observations. Squared Euclidean The sum of the squared differences between two observations. City block The sum of the absolute differences between two observations. Also known as Manhattan distance. Cosine The difference between 1 and the cosine coefficient of two observations. Cosine coefficient is the cosine of the angle between two vectors. Pearson correlation The difference between 1 and the correlation of two observations. Jaccard The difference between 1 and the Jaccard coefficient of two observations. For binary data, Jaccard coefficient equals the ratio of sizes of intersection and union of two observations. For variables to cluster, two methods are available. Missing values are excluded in a pairwise manner to calculate the correlation. Correlation The difference between 1 and the correlation of two variables. Absolute correlation The difference between 1 and the absolute correlation of two variables. To learn how to calculate distance, see the distance algorithm.
Standardize Variables	Specify the method to standardize variables. Available only when objects to cluster are observations. None Variables are not standardized. Z scores (standardize to N(0, 1)) Variables are standardized with zero mean and unit standard deviation. Normalize to (0,1) Variable are standardized in the range of 0 and 1.
Number of Clusters	Specify the number of clusters. The value should be greater than 0 and no more than the number of effective observations (cluster observations) or variables (cluster variables).
Find Clustroid by	Specify the method to find the clustroid: the most/least representative variable/observation Sum of distances Find Clustroid using the sum of distances measured from all other observations/variables in the cluster. In a cluster, the most representative variable/observation would have the minimum Sum of distances; the least representative variable/observation would have the maximum Sum of distances. Maximum distance Find Clustroid using the Maximum distance among all distances measured from other observations/variables in the cluster. In a cluster, the most representative variable/observation would have the smallest Maximum distance; the least representative variable/observation would have the biggest Maximum distance. Sum of squares of distances Find Clustroid using the sum of the squares of distances measured from all other observations/variables in the cluster. In a cluster, the most representative variable/observation would have the minimum Sum of squares of distances; the least representative variable/observation would have the maximum Sum of squares of distances.

Quantities

Specify the quantities to calculate for the Hierarchical Cluster Analysis. Note that descriptive statistics and cluster membership are included in the result of Hierarchical Cluster Analysis by default.

Dissimilarity Matrix	Specify whether to output the distance matrix. For a large number of objects, the distance matrix will be shown in a sheet instead of the report.
Cluster Stages	Specify whether to output the cluster stages. In each stage two clusters are merged to a new cluster.
Cluster Center	Specify whether to calculate cluster centers. It is available only when objects to cluster are observations. When a standardization method is chosen in Standardize Variables of the Settings branch, cluster centers are calculated from standardized variables.
Distance between Cluster Centers	Specify whether to calculate the distances between cluster centers. It is available only when objects to cluster are observations.
Distance between Observations and Clusters	Specify whether to calculate the distance between each observation and cluster centers. It is available only when objects to cluster are observations.
Clustroid Info	Specify whether to list the most/least representative variable or observation.

Plot

Specify whether and how to show the dendrogram.

Dendrogram	Specify whether to show the dendrogram. Note that the default dendrogram can be exchanged for a more dynamic "Phylogenetic Tree" in which nodes and subtrees can be highlighted and swapped.
Show Y Axis with	Distance Distance as computed by Distance Type. Similarity Similarity is computed as 100(1-d/dmax), where d is the distance, dmax is the maximum distance for all observations, i.e. the last distance calculation in the Cluster Stages* table. If you have opted to plot a separate graph for each cluster (Plot tab > Show Dendrogram button), then dmax is the maximum for all graphs.
Show Dendrogram	Specify whether to show the dendrogram in a single graph or in separate graphs for clusters. Enabled only when Dendrogram is checked. in a single graph Show the dendrogram in a single graph. Different clusters are shown in different colors. in separate graphs for clusters Show the dendrogram in separate graphs for clusters. Each cluster is output to separate graph.
Orientation	Specify the orientation of the dendrogram. Enabled only when Dendrogram is checked. Vertical Plot Dendrogram vertically. Horizontal Plot Dendrogram horizontally. Circular Plot circular Dendrogram

Output Settings

Specify the destination of output results for the Hierarchical Cluster Analysis.

Cluster Report	Specify the sheet for the Hierarchical Cluster Analysis report. The default value is a new sheet in the workbook of input data.
Cluster Membership	Specify the sheet for cluster membership and distance between observations and clusters. The default value is a new sheet in the workbook of input data.

Skip Navigation Links

All Books