Dimensionality Reduction
Dimensionality reduction is the process of transforming data from a high-dimensional space into a low-dimensional space while preserving its essential structure and meaningful properties. The primary goal is to obtain a more compact data representation to facilitate visualization, improve computational efficiency, and mitigate the Curse of Dimensionality.
Formalism
Let a dataset be represented by points in a high-dimensional space. Dimensionality reduction seeks a mapping function , where the new dimension is significantly smaller than the original dimension (i.e., ). This function transforms each data point into a new representation in the lower-dimensional space.
Main Approaches
- Feature Projection: Creates new, lower-dimensional features by computing combinations of the original ones (e.g., PCA).
- Feature Selection: Identifies and retains a subset of the most relevant original features, discarding the rest.