machine-learning semi-supervised-learning

Definition

Cluster Assumption

The cluster assumption posits that if two points reside in the same cluster within the instance space, they are highly likely to share the same target label . Equivalently, this implies that the decision boundary of a model should preferentially pass through regions of low probability density in the marginal distribution , thereby avoiding the bisection of high-density clusters.

Geometric Intuition

This assumption is fundamental to semi-supervised learning, as it allows unlabelled data to inform the placement of the decision boundary. By identifying high-density regions (clusters) using the unlabelled sample, the learner can ensure that the transition between classes occurs in the sparsely populated gaps, which often leads to better generalisation compared to using labelled data alone.