Hierarchical Clustering

Definition

Hierarchical Clustering

Hierarchical clustering is a method of cluster analysis that seeks to build a multilevel hierarchy of clusters. This is typically represented as a dendrogram (a tree-based visualisation) which illustrates the recursive grouping or splitting of data points. Formally, it categorises instances into a set of nested partitions.

Methodological Paradigms

Agglomerative (Bottom-Up): This approach begins with each observation as a singleton cluster and iteratively merges the most similar pairs of clusters until only a single cluster remains.

Divisive (Top-Down): This approach begins with the entire dataset as a single cluster and iteratively partitions it into smaller subclusters.

Comparison with Flat Clustering

While algorithms like k-means produce a single partition at a pre-specified granularity $k$ , hierarchical clustering provides a continuous spectrum of subclusters. This allows the learner to decide the optimal number of clusters post-hoc by identifying the level of the hierarchy that best reflects the data’s structural granularity.

Lukas' Notes

Hierarchical Clustering

Definition

Methodological Paradigms

Comparison with Flat Clustering

Graph View

Table of Contents