Realisable PAC-Learnable Hypothesis Class

Definition

Realisable PAC-Learnable Hypothesis Class

A hypothesis class $H$ is realisable PAC-learnable if there exists a learning algorithm $A$ and a sample complexity function $m (ε, δ)$ such that for any parameters $ε, δ \in (0, 1)$ and any distribution $P$ satisfying the realisability assumption, the following holds:
$Pr_{S \sim P^{m}} (R_{P} (A (S)) \leq ε) \geq 1 - δ$

provided that the sample size $m \geq m (ε, δ)$ . The produced hypothesis $h_{S}$ is approximately correct (error $\leq ε$ ) with a probability of at least $1 - δ$ (probably).

Empirical Risk Minimisation

A hypothesis class $H$ is realisable PAC-learnable by any ERM algorithm provided that the functional capacity of the class is bounded. For finite classes, the sample complexity scales with the logarithm of the cardinality $∣ H ∣$ . For infinite classes, learnability is guaranteed if and only if the VC dimension $d$ is finite, with the complexity bounded by:

m_{H} (ε, δ) = O (\frac{d ln ( 1/ ε ) + ln ( 1/ δ )}{ε})

This result establishes that the complexity of the functional space is the fundamental constraint on the amount of data required to ensure generalisation.

Finite-class realisable PAC guarantee ( ERM)

Let $H$ be finite, $Y = {0, 1}$ , and assume the realisability assumption, i.e. there exists $h^{*} \in H$ with $R_{P} (h^{*}) = 0$ . Let $A$ be an ERM algorithm and let $h_{S} = A (S)$ for $S = {(x_{i}, y_{i})}_{i = 1}^{m} \sim P^{m}$ .
Here $R_{P} (h)$ denotes the true risk of $h$ under distribution $P$ (equivalently $R (h)$ when $P$ is implicit).

Since $R_{P} (h^{*}) = 0$ , we have $R_{S} (h^{*}) = 0$ for every sample $S$ , hence ERM can return a consistent hypothesis and therefore $R_{S} (h_{S}) = 0$ .

Define the bad set
$H_{ε} := {h \in H : R_{P} (h) > ε} .$
Fix any $h \in H_{ε}$ . Under binary classification with 0-1 loss,
$R_{P} (h) = E_{(x, y) \sim P} [1 {h (x) \neq = y}] = (x, y) \sim P Pr (h (x) \neq = y),$
where the second equality uses that the expectation of an indicator equals the probability of its event. Hence
$(x, y) \sim P Pr (h (x) = y) = 1 - (x, y) \sim P Pr (h (x) \neq = y) = 1 - R_{P} (h) .$
Since $h \in H_{ε}$ implies $R_{P} (h) > ε$ , we obtain
$(x, y) \sim P Pr (h (x) = y) = 1 - R_{P} (h) < 1 - ε .$
By i.i.d. sampling,
$S \sim P^{m} Pr (\forall i \in {1, \dots, m} : h (x_{i}) = y_{i}) = i = 1 \prod m Pr (h (x_{i}) = y_{i}) = (1 - R_{P} (h))^{m} \leq (1 - ε)^{m} \leq e^{- ε m}$
Thus a fixed bad hypothesis is consistent with probability at most $e^{- ε m}$ .

Now consider the failure event ${R_{P} (h_{S}) > ε}$ . Because $h_{S}$ is consistent, failure implies that at least one bad hypothesis is consistent with $S$ . Therefore, by the union bound,
$S \sim P^{m} Pr (R_{P} (h_{S}) > ε) \leq Pr (\exists h \in H_{ε} : R_{S} (h) = 0) \leq h \in H_{ε} \sum Pr (R_{S} (h) = 0) \leq ∣ H_{ε} ∣ e^{- ε m} \leq ∣ H ∣ e^{- ε m} .$
To make this at most $δ$ , it is sufficient that
$∣ H ∣ e^{- ε m} \leq δ ⟺ m \geq \frac{ln ∣ H ∣ + ln ( 1/ δ )}{ε} .$
Hence, for all $m$ satisfying this bound,
$S \sim P^{m} Pr (R_{P} (h_{S}) \leq ε) \geq 1 - δ,$
so finite $H$ is realisable PAC-learnable by ERM.

Lukas' Notes

Realisable PAC-Learnable Hypothesis Class

Definition

Empirical Risk Minimisation

Graph View

Table of Contents

Backlinks