machine-learning optimisation calculus

Definition

Backpropagation

Backpropagation is the standard algorithm used to calculate the gradient of the loss function with respect to the weights of an artificial neural network. It is a specific application of the chain rule from calculus, enabling the efficient training of multi-layer architectures.

The process is divided into two distinct phases:

  1. Forward Pass: The input is propagated through the network to compute the predicted output and the resulting loss.
  2. Backward Pass: The error signal is propagated from the output layer back toward the input layer, calculating partial derivatives for each weight.

Algorithmic Procedure

Chain Rule Application: For a weight connecting neuron in layer to neuron in layer , the gradient is computed as:

where is the pre-activation and is the activation.

Weight Update: Once the gradients are computed, the parameters are updated using an optimisation algorithm such as gradient descent:

Computational Graph Example

Consider the function . This can be decomposed into intermediate steps to demonstrate the local gradient calculations.

Forward Pass: Let , then . Given :

Backward Pass: Calculating the derivatives via the chain rule:

This demonstrates how the gradient with respect to the inputs is efficiently calculated by propagating the local gradients backward through the graph.