Definition
I.I.D. Assumption
The i.i.d. assumption posits that a collection of random variables are independent and identically distributed. In machine learning, it is assumed that the instances are sampled independently from a fixed, stationary joint probability distribution .
- Independent: The outcome of one observation provides no information about the probability of another.
- Identically Distributed: Every observation is drawn from the same underlying distribution .
Statistical Foundationalism
This assumption is critical for the validity of Empirical Risk Minimisation. It ensures that the model’s performance on the observed training data is a statistically sound estimator of its generalisation error on unseen data. Violations of this assumption, such as concept drift or temporal correlation, typically require the use of online learning or specialized time-series models.