machine-learning statistics

Definition

True Risk

True risk (or generalisation error), denoted , is the expected loss of a hypothesis over the entire joint probability distribution . Formally:

where is the loss function.

Relation to Empirical Risk

In practice, the true risk is incomputable as the underlying distribution is unknown. It represents the theoretical performance of the model on unseen data. The objective of machine learning is to find a hypothesis that minimises , typically by using the empirical risk calculated from finite samples as a computable proxy. The difference between true and empirical risk defines the model’s generalisation capability.