machine-learning learning-theory statistics
Definition
Vapnik-Chervonenkis Dimension
The Vapnik-Chervonenkis (VC) dimension, denoted , is a measure of the capacity or complexity of a hypothesis class. It is defined as the cardinality of the largest set of points that can be shattered by . If can shatter sets of arbitrary size, the VC dimension is infinite.
PAC Learnability
The VC dimension provides a necessary and sufficient condition for PAC learnability in the binary classification setting. A class is PAC-learnable if and only if . The sample complexity is then bounded by the VC dimension:
Examples
Thresholds on the Real Line: For the class , the VC dimension is 1.
Axis-aligned Rectangles: In , the class of axis-aligned rectangles can shatter at most 4 points, resulting in .
Linear Separators: In a -dimensional space , the VC dimension of linear classifiers is .
Convex Polygons: For the class of convex polygons in , the VC dimension is infinite. This is because any number of points can be placed in convex position (e.g., on a circle), and for any binary labelling, a convex polygon can be constructed that contains exactly the points with positive labels.