machine-learning supervised-learning

Definition

Supervised Learning

Supervised learning is a paradigm of machine learning where the objective is to learn a mapping from an instance space to a label space based on provided input-output pairs. Formally, given a training dataset sampled i.i.d. from a joint distribution , the learner aims to identify a hypothesis that minimises the expected risk for a given loss function .

Data Representation

The supervised setting assumes a structural relationship between input and output, defined by the geometry of the data and the capacity of the learner. The learning process operates on an instance space (typically ) and a label space . The nature of distinguishes between tasks: discrete labels define classification, while continuous values define regression.

Inductive Bias

To prevent an unconstrained search, the learner is restricted to a hypothesis class . This restriction encodes the inductive bias, allowing the model to generalise from finite samples to the full distribution.

Empirical Risk Minimisation

Since the true distribution is unknown, the model is trained by identifying a hypothesis that minimises the loss over the available data. This principle, known as Empirical Risk Minimisation (ERM), seeks to approximate the optimal mapping by penalising discrepancies between the predicted and true labels.

Definition

Supervised Learning Algorithm

A supervised learning algorithm is a learning algorithm that maps a set of training data to a specific hypothesis:

where:

Link to original