machine-learning calculus linear-algebra
Definition
Vector-Jacobian Product
Let be a differentiable function, let , and let be the Jacobian matrix of at .
For a vector , the vector-Jacobian product is
In row-vector notation, the same object is written as .
The vector represents sensitivities at the output of . The VJP combines them into the corresponding sensitivities at the input of .
Dimensions
If , then:
So a VJP takes a vector from the output space and maps it back to the input space.
Chain Rule
Let
where is a scalar function. Then the chain rule gives:
This is exactly a vector-Jacobian product with seed vector .
Backward pass
In backpropagation, one usually does not form the full Jacobian explicitly. Instead, one repeatedly applies VJPs, moving a gradient from the output of each layer back to its input.
Special case
If is scalar-valued, then is a matrix. With seed , the VJP becomes:
So for scalar outputs, a VJP recovers the usual gradient.
Notation
Convention
The name “vector-Jacobian product” comes from row-vector notation . Many machine-learning texts and libraries use column gradients, so the same computation appears as . The underlying operation is the same.
Comparison
VJP versus JVP
A Jacobian-vector product uses
so it pushes an input direction forward through .
A vector-Jacobian product uses
so it pulls output sensitivities backward to the input side.
Forward-mode automatic differentiation is based on JVPs, while reverse-mode automatic differentiation is based on VJPs.
Examples
Two-output function
Let
Then the Jacobian is
For , the VJP is
Each input coordinate receives contributions from every output coordinate, weighted by .
If we now define a scalar loss
then
and therefore
This is the reverse-mode step carried out by a VJP.