Lukas' Notes

On-Policy Action-Value Function

Feb 25, 20261 min read

reinforcement-learning

Definition

On-Policy Action-Value Function

The on-policy action-value function Qπ(s,a) gives the expected return if you start in state s, take and arbitrary action a (not necessarily from the policy), and then forever after act according to policy π:

Qπ(s,a)=Eτ∼π​[R(τ)∣s0​=s, a0​=a]

Graph View

Created with Quartz v4.4.0 © 2026

  • GitHub