math machine-learning statistics
Definition
Distribution (Data)
In the context of machine learning and statistics, a distribution refers to the spatial arrangement or frequency of data points within an instance space . Unlike a theoretical probability distribution, which is a normalised measure over a sigma-algebra, an empirical distribution (or point cloud) is represented as a set of discrete observations .
Formally, this can be viewed as a sum of Dirac delta measures:
Structural Properties
The distribution captures the intrinsic geometric properties of the dataset, such as clusters, manifolds, and density variations. These properties are utilised by unsupervised learning algorithms to identify underlying patterns or identify low-dimensional latent representations without the guidance of explicit target labels.