machine-learning clustering statistics

Definition

Hierarchical Clustering

Hierarchical clustering is a method of cluster analysis that seeks to build a multilevel hierarchy of clusters. This is typically represented as a dendrogram (a tree-based visualisation) which illustrates the recursive grouping or splitting of data points. Formally, it categorises instances into a set of nested partitions.

Methodological Paradigms

Agglomerative (Bottom-Up): This approach begins with each observation as a singleton cluster and iteratively merges the most similar pairs of clusters until only a single cluster remains.

Divisive (Top-Down): This approach begins with the entire dataset as a single cluster and iteratively partitions it into smaller subclusters.

Comparison with Flat Clustering

While algorithms like k-means produce a single partition at a pre-specified granularity , hierarchical clustering provides a continuous spectrum of subclusters. This allows the learner to decide the optimal number of clusters post-hoc by identifying the level of the hierarchy that best reflects the data’s structural granularity.