machine-learning calculus linear-algebra

Definition

Jacobian-Vector Product

Let be a differentiable function, let , and let be the Jacobian matrix of at .

For a vector , the Jacobian-vector product is

The vector represents a direction in the input space. The JVP gives the corresponding first-order change in the output of .

Dimensions

If , then:

So a JVP takes a direction from the input space and pushes it forward to the output space.

Directional derivative

Let

Then, by the chain rule,

So the JVP is the directional derivative of at in direction .

Special case

If is scalar-valued, then is a matrix. Therefore:

So for scalar outputs, a JVP is the directional derivative obtained by taking the gradient and dotting it with the direction vector.

Comparison

JVP versus VJP

A Jacobian-vector product uses

so it pushes an input direction forward through .

A vector-Jacobian product uses

so it pulls output sensitivities backward to the input side.

Forward-mode automatic differentiation is based on JVPs, while reverse-mode automatic differentiation is based on VJPs.

Implementations

Forward-mode evaluation

A JVP can be computed without forming the full Jacobian explicitly. The usual method is to evaluate the function once while carrying, for each intermediate value, both:

  • its ordinary value, called the primal;
  • its first-order change in the chosen direction, called the tangent.

The input is seeded with the direction vector, and constants are seeded with zero.

struct Dual {
    primal
    tangent
}

function jvp(f, x, u)
    // x is the input point, u is the input direction
    seed each input variable x_i as Dual(x_i, u_i)
    seed each constant c as Dual(c, 0)
    evaluate f using the propagation rules below
    return the tangent part of the final output

Primitive operations propagate tangents by the ordinary differentiation rules:

add((a, da), (b, db)) = (a + b, da + db)
sub((a, da), (b, db)) = (a - b, da - db)
mul((a, da), (b, db)) = (a * b, da * b + a * db)
div((a, da), (b, db)) = (a / b, (da * b - a * db) / (b * b))
sin((a, da))         = (sin(a), cos(a) * da)

Python sketch:

from __future__ import annotations
 
from dataclasses import dataclass
from math import cos, sin
from typing import Callable
 
 
@dataclass(frozen=True)
class Dual:
    primal: float
    tangent: float
 
 
def add(x: Dual, y: Dual) -> Dual:
    return Dual(x.primal + y.primal, x.tangent + y.tangent)
 
 
def sub(x: Dual, y: Dual) -> Dual:
    return Dual(x.primal - y.primal, x.tangent - y.tangent)
 
 
def mul(x: Dual, y: Dual) -> Dual:
    return Dual(
        x.primal * y.primal,
        x.tangent * y.primal + x.primal * y.tangent,
    )
 
 
def div(x: Dual, y: Dual) -> Dual:
    return Dual(
        x.primal / y.primal,
        (x.tangent * y.primal - x.primal * y.tangent) / (y.primal * y.primal),
    )
 
 
def sin_dual(x: Dual) -> Dual:
    return Dual(sin(x.primal), cos(x.primal) * x.tangent)
 
 
def jvp(f: Callable[[Dual], Dual], x: float, u: float) -> float:
    seeded_x = Dual(x, u)
    return f(seeded_x).tangent

For tensor-valued code, the same idea is used with matching shapes: the tangent associated with a tensor has the same shape as the tensor itself. The final tangent is the Jacobian-vector product.

Examples

Two-output function

Let

Then the Jacobian is

For , the JVP is

This gives the first-order change of both output coordinates when the input is perturbed in direction .

If we choose

then

which is exactly the rate of change of when only is varied.