machine-learning optimisation calculus

Definition

Gradient Descent

Gradient descent is a first-order iterative optimisation algorithm used to identify a local minimum of a differentiable function. In the context of model training, it minimises a loss function by updating parameters in the direction of the steepest descent.

The update rule for iteration is:

Convergence Properties

Learning Rate Impact: The hyperparameter (step size) is critical; excessive values may lead to divergence, while insufficient values result in slow convergence.

Local Optima: For non-convex objective functions, the algorithm may converge to a local minimum or a saddle point rather than the global minimum.

Guarantees: Convergence to the global minimum is guaranteed for -smooth convex functions, provided the learning rate is sufficiently small.