Definition
Zero-One Loss
The Zero-One Loss is a loss function used in classification tasks to measure prediction error. It is the most direct metric for misclassification, assigning a penalty of 1 for an incorrect prediction and 0 for a correct one.
Let be the model’s predicted label for an input , and let be the true label. The zero-one loss, denoted , is defined as:
This can be written more compactly using an indicator function :
Probabilistic View
Indicator Expectation (Misclassification Probability): Under 0-1 loss, the loss random variable equals an indicator of the error event. For ,
so its expectation is the probability of misclassification:
Risk Identity (True vs Empirical): This yields a direct interpretation of both true risk and empirical risk. The true risk is
while for a sample the empirical risk is
which is exactly the observed misclassification rate.
Complement Event (Accuracy Form): Since correctness and error are complementary events,
This identity is the key probabilistic step used in finite-class realisable PAC proofs, where one bounds the probability that a bad hypothesis is consistent on all sampled examples.
Properties and Role
- Direct Interpretation: The empirical risk calculated with zero-one loss is exactly the model’s error rate or misclassification rate on the dataset.
- Optimisation Challenge: The function is none-convex and non-differentiable. This makes it computationally intractable (NP-hard) to minimise directly with standard gradient-based optimisation algorithms.
- Practical Use: While minimising zero-one loss is the ultimate theoretical goal of classification, in practice, algorithms optimise a continuous and convex surrogate loss function (like Hinge loss or Cross-Entropy) instead. The zero-one loss is then used as a final evaluation metric to report the model’s performance.