Linear Regression

Definition

Linear Regression

Linear regression models the relationship between a dependent variable $y \in R$ and one or more independent variables $x \in R^{d}$ by assuming a linear dependency. Formally, for an observation $(x_{i}, y_{i})$ , the model assumes:

$y_{i} = w^{⊤} x_{i} + b + ϵ_{i}$

where $w$ is the coefficient vector (slope), $b$ is the intercept (bias), and $ϵ_{i} \sim N (0, σ^{2})$ represents normally distributed error or noise. The objective is to identify the hyperplane that best fits the observed data by minimising the discrepancy between the predicted values $\overset{y}{^}_{i}$ and actual observations $y_{i}$ .

Estimation: Minimising Squared Error

The parameters $w$ and $b$ are typically estimated by minimising the Mean Squared Error (MSE), which represents the average squared residual over $n$ samples:

$J (w, b) = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - (w^{⊤} x_{i} + b))^{2}$

Normal Equations (Exact Solution): When the feature dimension $d$ is manageable, the optimal parameters can be calculated analytically. By prepending a constant $1$ to $x$ to absorb $b$ into $w$ , the solution is $w = (X X^{⊤})^{- 1} X y$ . This provides the global minimum in a single step without requiring hyperparameters.

Gradient Descent (Iterative Optimisation): For high-dimensional datasets or online learning settings, parameters are updated iteratively in the direction of the negative gradient: $θ \leftarrow θ - η \nabla J (θ)$ . This approach scales better to large data but requires the selection of a learning rate $η$ .

Lukas' Notes

Linear Regression

Definition

Estimation: Minimising Squared Error

Graph View

Table of Contents

Backlinks