machine-learning calculus linear-algebra

Definition

Vector-Jacobian Product

Let be a differentiable function, let , and let be the Jacobian matrix of at .

For a vector , the vector-Jacobian product is

In row-vector notation, the same object is written as .

The vector represents sensitivities at the output of . The VJP combines them into the corresponding sensitivities at the input of .

Dimensions

If , then:

So a VJP takes a vector from the output space and maps it back to the input space.

Chain Rule

Let

where is a scalar function. Then the chain rule gives:

This is exactly a vector-Jacobian product with seed vector .

Backward pass

In backpropagation, one usually does not form the full Jacobian explicitly. Instead, one repeatedly applies VJPs, moving a gradient from the output of each layer back to its input.

Special case

If is scalar-valued, then is a matrix. With seed , the VJP becomes:

So for scalar outputs, a VJP recovers the usual gradient.

Notation

Convention

The name “vector-Jacobian product” comes from row-vector notation . Many machine-learning texts and libraries use column gradients, so the same computation appears as . The underlying operation is the same.

Comparison

VJP versus JVP

A Jacobian-vector product uses

so it pushes an input direction forward through .

A vector-Jacobian product uses

so it pulls output sensitivities backward to the input side.

Forward-mode automatic differentiation is based on JVPs, while reverse-mode automatic differentiation is based on VJPs.

Examples

Two-output function

Let

Then the Jacobian is

For , the VJP is

Each input coordinate receives contributions from every output coordinate, weighted by .

If we now define a scalar loss

then

and therefore

This is the reverse-mode step carried out by a VJP.