Gradient Descent

Definition

Gradient Descent

Gradient descent is a first-order iterative optimisation algorithm used to identify a local minimum of a differentiable function. In the context of model training, it minimises a loss function $J (θ)$ by updating parameters $θ$ in the direction of the steepest descent.

The update rule for iteration $t$ is:
$θ_{t + 1} = θ_{t} - η \cdot \nabla_{θ} J (θ_{t})$

Convergence Properties

Learning Rate Impact: The hyperparameter $η$ (step size) is critical; excessive values may lead to divergence, while insufficient values result in slow convergence.

Local Optima: For non-convex objective functions, the algorithm may converge to a local minimum or a saddle point rather than the global minimum.

Guarantees: Convergence to the global minimum is guaranteed for $L$ -smooth convex functions, provided the learning rate is sufficiently small.

Lukas' Notes

Gradient Descent

Definition

Convergence Properties

Graph View

Table of Contents

Backlinks