machine-learning dimensionality-reduction probability

Definition

Random Projection

Random projection is a computationally efficient dimensionality reduction technique that maps high-dimensional data points into a lower-dimensional subspace using a random matrix . Formally, the projection is defined as:

where the entries of are sampled independently from a zero-mean distribution (e.g., ).

Distance Preservation

Johnson-Lindenstrauss Lemma

For any set of points in and any , there exists a projection into dimensions where such that the pairwise squared Euclidean distances between all points are preserved within a factor of :

Computational Efficiency

Compared to PCA, which has a complexity of , random projection is significantly faster, requiring only time. It is particularly effective for extremely high-dimensional datasets where the covariance matrix calculation is computationally prohibitive.