machine-learning statistics learning-theory

Definition

PAC Learning

Probably Approximately Correct (PAC) learning is a mathematical framework for quantifying the learnability of a hypothesis class . A class is PAC-learnable if there exists a learning algorithm and a sample complexity function such that for any distribution and any parameters :

provided that the sample size . The parameters define the rigor of the guarantee:

  • Approximately Correct (): The hypothesis achieves an error rate no greater than .
  • Probably (): This accuracy is achieved with high confidence.

Learnability of Finite Classes

If the hypothesis class is finite and the realisability assumption holds, any hypothesis that is consistent with the training set (i.e., has zero empirical risk) is PAC-learnable. The required sample complexity is bounded by:

Derivation Intuition: Consider a “bad” hypothesis with . The probability that this hypothesis is consistent with a single sample is at most . For independent samples, the probability that a bad hypothesis remains consistent is . To ensure that the probability of any bad hypothesis in being consistent is less than , we require . Using the inequality , this yields , which solves to the bound above.

Learnability of Infinite Classes

For classes with infinite cardinality, such as linear separators in , PAC-learnability is determined by the VC dimension. A class is PAC-learnable if and only if its VC dimension is finite.