Exploding Gradient

Definition

Exploding Gradient

The exploding gradient problem is a phenomenon in deep artificial neural networks where the gradients of the loss function with respect to the weights grow exponentially during backpropagation. This leads to large weight updates, causing the model’s weights to oscillate or reach numerical overflow (NaN), making the learning process unstable.

Mathematical Mechanism

Exponential Growth: Similar to the vanishing gradient problem, this issue arises from the repeated multiplication of derivatives across layers. If the weights are large or the activation functions have derivatives greater than $1$ , the resulting product increases exponentially with the depth of the network.

Recurrent Architectures: Exploding gradients are particularly prevalent in Recurrent Neural Networks because the same weight matrix is applied repeatedly across time steps. Any eigenvalue of the weight matrix greater than $1$ can lead to exponential amplification of the gradient over long sequences.

Mitigation Strategies

Gradient Clipping: This technique involves rescaling the gradients if their norm exceeds a predefined threshold. This prevents excessively large updates while maintaining the direction of the gradient descent step.

Weight Regularisation: Applying L1 or L2 regularisation to the weights encourages smaller weight values, which in turn limits the magnitude of the derivatives in the chain rule product.

Architectural Adjustments: Using Residual Connections or gated units (such as LSTMs or GRUs) can help manage the flow of gradients and prevent uncontrolled growth.

Lukas' Notes

Exploding Gradient

Definition

Mathematical Mechanism

Mitigation Strategies

Graph View

Table of Contents