Lukas' Notes

linear-algebra statistics machine-learning

Definition

L2 Divergence

The L2 divergence is the divergence function that measures squared L2 norm discrepancy between two vectors.

For , it is defined by

Some conventions include a factor , giving , because it simplifies gradients. Both forms measure the same discrepancy up to a constant factor.

Interpretation

L2 divergence compares two vectors coordinate by coordinate, squares each difference, and adds the squared errors. Large coordinate errors are penalised more strongly than small ones.

The unsquared quantity is the Euclidean distance. Squaring it gives a divergence that is smooth and convenient for optimisation, but it is no longer a metric because the triangle inequality need not hold.

Gradient

For the common half-scaled version

the gradient with respect to is

This is why the factor is common in optimisation: it cancels the factor from differentiating the square.