Vector-Jacobian Product

machine-learning calculus linear-algebra

Definition

Vector-Jacobian Product

Let $f : R^{n} \to R^{m}$ be a differentiable function, let $x \in R^{n}$ , and let $J_{f} (x) \in R^{m \times n}$ be the Jacobian matrix of $f$ at $x$ .

For a vector $v \in R^{m}$ , the vector-Jacobian product is
$VJP_{f} (x; v) = J_{f} (x)^{T} v \in R^{n} .$
In row-vector notation, the same object is written as $v^{T} J_{f} (x)$ .

The vector $v$ represents sensitivities at the output of $f$ . The VJP combines them into the corresponding sensitivities at the input of $f$ .

Dimensions

If $f : R^{n} \to R^{m}$ , then:

J_{f} (x) \in R^{m \times n}, v \in R^{m}, J_{f} (x)^{T} v \in R^{n} .

So a VJP takes a vector from the output space and maps it back to the input space.

Chain Rule

Let

y = f (x), L = L (y),

where $L : R^{m} \to R$ is a scalar function. Then the chain rule gives:

\nabla_{x} L = J_{f} (x)^{T} \nabla_{y} L .

This is exactly a vector-Jacobian product with seed vector $v = \nabla_{y} L$ .

Backward pass

In backpropagation, one usually does not form the full Jacobian explicitly. Instead, one repeatedly applies VJPs, moving a gradient from the output of each layer back to its input.

Special case

If $f : R^{n} \to R$ is scalar-valued, then $J_{f} (x)$ is a $1 \times n$ matrix. With seed $v = 1$ , the VJP becomes:

J_{f} (x)^{T} \cdot 1 = \nabla f (x) .

So for scalar outputs, a VJP recovers the usual gradient.

Notation

Convention

The name “vector-Jacobian product” comes from row-vector notation $v^{T} J$ . Many machine-learning texts and libraries use column gradients, so the same computation appears as $J^{T} v$ . The underlying operation is the same.

Comparison

VJP versus JVP

A Jacobian-vector product uses
$J_{f} (x) u,$
so it pushes an input direction $u$ forward through $f$ .

A vector-Jacobian product uses
$J_{f} (x)^{T} v,$
so it pulls output sensitivities $v$ backward to the input side.

Forward-mode automatic differentiation is based on JVPs, while reverse-mode automatic differentiation is based on VJPs.

Examples

Two-output function

Let
$f (x_{1}, x_{2}) = [x_{1} x_{2} x_{1}^{2} + sin x_{2}] .$
Then the Jacobian is
$J_{f} (x_{1}, x_{2}) = [x_{2} 2 x_{1} x_{1} cos x_{2}] .$
For $v = [v_{1} v_{2}]$ , the VJP is
$J_{f} (x_{1}, x_{2})^{T} v = [x_{2} x_{1} 2 x_{1} cos x_{2}] [v_{1} v_{2}] = [x_{2} v_{1} + 2 x_{1} v_{2} x_{1} v_{1} + (cos x_{2}) v_{2}] .$
Each input coordinate receives contributions from every output coordinate, weighted by $v$ .

If we now define a scalar loss
$L (y) = 3 y_{1} - y_{2},$
then
$\nabla_{y} L = [3 - 1],$
and therefore
$\nabla_{x} L = J_{f} (x)^{T} [3 - 1] = [3 x_{2} - 2 x_{1} 3 x_{1} - cos x_{2}] .$
This is the reverse-mode step carried out by a VJP.

Lukas' Notes

Vector-Jacobian Product

Definition

Dimensions

Chain Rule

Special case

Notation

Comparison

Examples

Graph View

Table of Contents

Backlinks