linear-algebra statistics machine-learning
Definition
L2 Divergence
The L2 divergence is the divergence function that measures squared L2 norm discrepancy between two vectors.
For , it is defined by
Some conventions include a factor , giving , because it simplifies gradients. Both forms measure the same discrepancy up to a constant factor.
Interpretation
L2 divergence compares two vectors coordinate by coordinate, squares each difference, and adds the squared errors. Large coordinate errors are penalised more strongly than small ones.
The unsquared quantity is the Euclidean distance. Squaring it gives a divergence that is smooth and convenient for optimisation, but it is no longer a metric because the triangle inequality need not hold.
Gradient
For the common half-scaled version
the gradient with respect to is
This is why the factor is common in optimisation: it cancels the factor from differentiating the square.