statistics information-theory probability

Definition

Kullback–Leibler Divergence

The Kullback–Leibler (KL) divergence is a non-symmetric measure of the difference between two probability distributions and defined over the same probability space. Formally, for discrete distributions, it is defined as:

For continuous distributions, the sum is replaced by an integral. It quantifies the information lost when is used to approximate .

Properties

Non-negativity: According to Gibbs’ inequality, for all distributions, with equality holding if and only if .

Asymmetry: . Consequently, KL divergence is not a formal metric, as it violates the symmetry and triangle inequality axioms.

Application in ML: It is the foundational objective for many algorithms, including t-SNE, where it is minimised to align low-dimensional embeddings with high-dimensional data structures, and in variational inference.