Lukas' Notes

machine-learning optimisation

Definition

Quadratic Loss

Quadratic loss is the loss function that penalises a prediction by the square of its residual. For a numerical target and prediction , let

The quadratic loss is

The factor does not change the minimiser; it cancels the when differentiating. The loss is small near the correct prediction and grows quadratically as the residual moves away from zero, so large errors are penalised disproportionately.

From residuals to a quadratic surface

For a dataset , the average quadratic loss is the mean squared error up to the factor :

If the model is linear, , then is a quadratic function of the parameters:

where is positive semidefinite. This is why squared-error regression has a bowl-shaped optimisation surface: the residuals are linear in , and squaring them turns the objective into a quadratic surface.

Gradient and curvature

In one dimension, write

Then

The minimiser is the point where the derivative vanishes:

Because the second derivative is constant, the local quadratic approximation is not merely an approximation. It is the whole objective.

Relation to the learning rate

For gradient descent on the one-dimensional quadratic,

the error evolves as

Thus the optimal fixed learning rate in this one-dimensional case is

With this choice, gradient descent reaches the minimiser in one step. If , convergence is monotone; if , convergence oscillates; if , the iterates diverge.

In several dimensions, the same statement applies separately along the eigenvalue directions of the Hessian matrix. A single scalar learning rate must compromise between directions of small and large curvature.