machine-learning statistics

Definition

Overfitting

Overfitting is a phenomenon in machine learning where a model learns the noise and specific fluctuations of the training dataset rather than the underlying distribution . While the model achieves a very low empirical risk, it fails to generalise, leading to a high true risk on unseen data. Formally, overfitting is characterised by high variance.

Relation to Complexity

Overfitting typically occurs when the model complexity is excessive relative to the amount of available training data. It can be mitigated by increasing the size of the dataset or by applying regularisation techniques, which penalise the magnitude of the model parameters to ensure a simpler hypothesis.