machine-learning

Definition

Zero-One Loss

The Zero-One Loss is a loss function used in classification tasks to measure prediction error. It is the most direct metric for misclassification, assigning a penalty of 1 for an incorrect prediction and 0 for a correct one.

Let be the model’s predicted label for an input , and let be the true label. The zero-one loss, denoted , is defined as:

This can be written more compactly using an indicator function :

Probabilistic View

Indicator Expectation (Misclassification Probability): Under 0-1 loss, the loss random variable equals an indicator of the error event. For ,

so its expectation is the probability of misclassification:

Risk Identity (True vs Empirical): This yields a direct interpretation of both true risk and empirical risk. The true risk is

while for a sample the empirical risk is

which is exactly the observed misclassification rate.

Complement Event (Accuracy Form): Since correctness and error are complementary events,

This identity is the key probabilistic step used in finite-class realisable PAC proofs, where one bounds the probability that a bad hypothesis is consistent on all sampled examples.

Properties and Role

  • Direct Interpretation: The empirical risk calculated with zero-one loss is exactly the model’s error rate or misclassification rate on the dataset.
  • Optimisation Challenge: The function is none-convex and non-differentiable. This makes it computationally intractable (NP-hard) to minimise directly with standard gradient-based optimisation algorithms.
  • Practical Use: While minimising zero-one loss is the ultimate theoretical goal of classification, in practice, algorithms optimise a continuous and convex surrogate loss function (like Hinge loss or Cross-Entropy) instead. The zero-one loss is then used as a final evaluation metric to report the model’s performance.