Definition
Quadratic Loss
Quadratic loss is the loss function that penalises a prediction by the square of its residual. For a numerical target and prediction , let
The quadratic loss is
The factor does not change the minimiser; it cancels the when differentiating. The loss is small near the correct prediction and grows quadratically as the residual moves away from zero, so large errors are penalised disproportionately.
From residuals to a quadratic surface
For a dataset , the average quadratic loss is the mean squared error up to the factor :
If the model is linear, , then is a quadratic function of the parameters:
where is positive semidefinite. This is why squared-error regression has a bowl-shaped optimisation surface: the residuals are linear in , and squaring them turns the objective into a quadratic surface.
Gradient and curvature
In one dimension, write
Then
The minimiser is the point where the derivative vanishes:
Because the second derivative is constant, the local quadratic approximation is not merely an approximation. It is the whole objective.
Relation to the learning rate
For gradient descent on the one-dimensional quadratic,
the error evolves as
Thus the optimal fixed learning rate in this one-dimensional case is
With this choice, gradient descent reaches the minimiser in one step. If , convergence is monotone; if , convergence oscillates; if , the iterates diverge.
In several dimensions, the same statement applies separately along the eigenvalue directions of the Hessian matrix. A single scalar learning rate must compromise between directions of small and large curvature.