Definition
Semi-Supervised Learning
Semi-supervised learning is a paradigm of machine learning that combines a small amount of labelled data with a large amount of unlabelled data during training. Formally, given a labelled dataset and an unlabelled dataset where , the learner aims to find a hypothesis that leverages the distribution of to improve the predictive performance on the label space compared to learning from alone.
Fundamental Assumptions
The success of semi-supervised methods relies on assumptions about the underlying data geometry. The cluster assumption posits that decision boundaries should reside in low-density regions, ensuring that points in the same cluster share labels. Complementarily, the manifold assumption states that high-dimensional data is supported on a lower-dimensional manifold , which can be characterised using the unlabelled portion of the dataset.