Normal Equations

machine-learning linear-algebra optimisation

Definition

Normal Equations

The normal equations provide a closed-form analytical solution for finding the optimal parameters $w$ that minimise the sum of squared errors in a linear regression task. Formally, given a design matrix $X$ and a target vector $y$ , the optimal weight vector $w^{*}$ is:

$w^{*} = (X X^{⊤})^{- 1} X y$

where $X \in R^{D \times m}$ is the matrix of input features (with $m$ samples as columns) and $y \in R^{m}$ is the target vector.

Operational Properties

Exact Minimisation: Unlike iterative methods like gradient descent, the normal equations provide the global minimum of the quadratic loss function in a single computational step, requiring no hyperparameters such as a learning rate.

Computational Complexity: The solution requires the inversion of the matrix $X X^{⊤} \in R^{D \times D}$ , which has a complexity of $O (D^{3})$ . This makes the method highly efficient for low-dimensional feature spaces but computationally prohibitive as the number of features $D$ increases.

Lukas' Notes

Normal Equations

Definition

Operational Properties

Graph View

Table of Contents

Backlinks