Lukas' Notes

On-Policy Value Function

Feb 25, 20261 min read

reinforcement-learning

Definition

On-Policy Value Function

The on-policy value function Vπ(s) gives the expected return if you start in state s and always act according to policy π:

Vπ(s)=Eτ∼π​[R(τ)∣s0​=s]

Graph View

Created with Quartz v4.4.0 © 2026

  • GitHub