Jacobian-Vector Product

machine-learning calculus linear-algebra

Definition

Jacobian-Vector Product

Let $f : R^{n} \to R^{m}$ be a differentiable function, let $x \in R^{n}$ , and let $J_{f} (x) \in R^{m \times n}$ be the Jacobian matrix of $f$ at $x$ .

For a vector $u \in R^{n}$ , the Jacobian-vector product is
$JVP_{f} (x; u) = J_{f} (x) u \in R^{m} .$
The vector $u$ represents a direction in the input space. The JVP gives the corresponding first-order change in the output of $f$ .

Dimensions

If $f : R^{n} \to R^{m}$ , then:

J_{f} (x) \in R^{m \times n}, u \in R^{n}, J_{f} (x) u \in R^{m} .

So a JVP takes a direction from the input space and pushes it forward to the output space.

Directional derivative

Let

γ (t) = x + t u .

Then, by the chain rule,

\frac{d}{d t} f (γ (t))_{t = 0} = J_{f} (x) u .

So the JVP is the directional derivative of $f$ at $x$ in direction $u$ .

Special case

If $f : R^{n} \to R$ is scalar-valued, then $J_{f} (x)$ is a $1 \times n$ matrix. Therefore:

J_{f} (x) u = \nabla f (x)^{T} u .

So for scalar outputs, a JVP is the directional derivative obtained by taking the gradient and dotting it with the direction vector.

Comparison

JVP versus VJP

A Jacobian-vector product uses
$J_{f} (x) u,$
so it pushes an input direction $u$ forward through $f$ .

A vector-Jacobian product uses
$J_{f} (x)^{T} v,$
so it pulls output sensitivities $v$ backward to the input side.

Forward-mode automatic differentiation is based on JVPs, while reverse-mode automatic differentiation is based on VJPs.

Implementations

Forward-mode evaluation

A JVP can be computed without forming the full Jacobian explicitly. The usual method is to evaluate the function once while carrying, for each intermediate value, both:

its ordinary value, called the primal;
its first-order change in the chosen direction, called the tangent.

The input is seeded with the direction vector, and constants are seeded with zero.

struct Dual {
    primal
    tangent
}

function jvp(f, x, u)
    // x is the input point, u is the input direction
    seed each input variable x_i as Dual(x_i, u_i)
    seed each constant c as Dual(c, 0)
    evaluate f using the propagation rules below
    return the tangent part of the final output

Primitive operations propagate tangents by the ordinary differentiation rules:

add((a, da), (b, db)) = (a + b, da + db)
sub((a, da), (b, db)) = (a - b, da - db)
mul((a, da), (b, db)) = (a * b, da * b + a * db)
div((a, da), (b, db)) = (a / b, (da * b - a * db) / (b * b))
sin((a, da))         = (sin(a), cos(a) * da)

Python sketch:

from __future__ import annotations
 
from dataclasses import dataclass
from math import cos, sin
from typing import Callable
 
 
@dataclass(frozen=True)
class Dual:
    primal: float
    tangent: float
 
 
def add(x: Dual, y: Dual) -> Dual:
    return Dual(x.primal + y.primal, x.tangent + y.tangent)
 
 
def sub(x: Dual, y: Dual) -> Dual:
    return Dual(x.primal - y.primal, x.tangent - y.tangent)
 
 
def mul(x: Dual, y: Dual) -> Dual:
    return Dual(
        x.primal * y.primal,
        x.tangent * y.primal + x.primal * y.tangent,
    )
 
 
def div(x: Dual, y: Dual) -> Dual:
    return Dual(
        x.primal / y.primal,
        (x.tangent * y.primal - x.primal * y.tangent) / (y.primal * y.primal),
    )
 
 
def sin_dual(x: Dual) -> Dual:
    return Dual(sin(x.primal), cos(x.primal) * x.tangent)
 
 
def jvp(f: Callable[[Dual], Dual], x: float, u: float) -> float:
    seeded_x = Dual(x, u)
    return f(seeded_x).tangent

For tensor-valued code, the same idea is used with matching shapes: the tangent associated with a tensor has the same shape as the tensor itself. The final tangent is the Jacobian-vector product.

Examples

Two-output function

Let
$f (x_{1}, x_{2}) = [x_{1} x_{2} x_{1}^{2} + sin x_{2}] .$
Then the Jacobian is
$J_{f} (x_{1}, x_{2}) = [x_{2} 2 x_{1} x_{1} cos x_{2}] .$
For $u = [u_{1} u_{2}]$ , the JVP is
$J_{f} (x_{1}, x_{2}) u = [x_{2} 2 x_{1} x_{1} cos x_{2}] [u_{1} u_{2}] = [x_{2} u_{1} + x_{1} u_{2} 2 x_{1} u_{1} + (cos x_{2}) u_{2}] .$
This gives the first-order change of both output coordinates when the input is perturbed in direction $u$ .

If we choose
$u = [10],$
then
$J_{f} (x_{1}, x_{2}) u = [x_{2} 2 x_{1}],$
which is exactly the rate of change of $f$ when only $x_{1}$ is varied.

Lukas' Notes

Jacobian-Vector Product

Definition

Dimensions

Directional derivative

Special case

Comparison

Implementations

Examples

Graph View

Table of Contents

Backlinks