so it pulls output sensitivities v backward to the input side.
Forward-mode automatic differentiation is based on JVPs, while reverse-mode automatic differentiation is based on VJPs.
Implementations
Forward-mode evaluation
A JVP can be computed without forming the full Jacobian explicitly. The usual method is to evaluate the function once while carrying, for each intermediate value, both:
its ordinary value, called the primal;
its first-order change in the chosen direction, called the tangent.
The input is seeded with the direction vector, and constants are seeded with zero.
struct Dual {
primal
tangent
}
function jvp(f, x, u)
// x is the input point, u is the input direction
seed each input variable x_i as Dual(x_i, u_i)
seed each constant c as Dual(c, 0)
evaluate f using the propagation rules below
return the tangent part of the final output
Primitive operations propagate tangents by the ordinary differentiation rules:
add((a, da), (b, db)) = (a + b, da + db)
sub((a, da), (b, db)) = (a - b, da - db)
mul((a, da), (b, db)) = (a * b, da * b + a * db)
div((a, da), (b, db)) = (a / b, (da * b - a * db) / (b * b))
sin((a, da)) = (sin(a), cos(a) * da)
For tensor-valued code, the same idea is used with matching shapes: the tangent associated with a tensor has the same shape as the tensor itself. The final tangent is the Jacobian-vector product.