machine-learning statistics

Definition

Curse of Dimensionality

The curse of dimensionality refers to various phenomena that arise when analysing and organising data in high-dimensional spaces that do not occur in low-dimensional settings. Formally, as the dimensionality increases, the volume of the space grows exponentially, causing the available data points to become sparse.

Geometric Intuition

Volume Concentration: In high dimensions, most of the volume of a hypercube is located near its corners, and the volume of an inscribed hypersphere becomes negligible relative to the hypercube. Formally, the ratio of the volume of a unit -ball to a unit -cube approaches zero as .

Distance Convergence: As dimensionality increases, the contrast between the maximum and minimum distances between data points diminishes. For points sampled from a distribution, the ratio:

This renders distance-based methods (e.g., kNN) less effective, as all points become nearly equidistant.

Sample Complexity: To maintain a constant density of data points, the number of required samples grows exponentially with the dimensionality, a problem often addressed through dimensionality reduction.