machine-learning

Definition

Semi-Supervised Learning

Semi-supervised learning is a paradigm of machine learning that combines a small amount of labelled data with a large amount of unlabelled data during training. Formally, given a labelled dataset and an unlabelled dataset where , the learner aims to find a hypothesis that leverages the distribution of to improve the predictive performance on the label space compared to learning from alone.

Fundamental Assumptions

The success of semi-supervised methods relies on assumptions about the underlying data geometry. The cluster assumption posits that decision boundaries should reside in low-density regions, ensuring that points in the same cluster share labels. Complementarily, the manifold assumption states that high-dimensional data is supported on a lower-dimensional manifold , which can be characterised using the unlabelled portion of the dataset.