Lukas' Notes

On-Policy Action-Value Function

May 27, 20261 min read

reinforcement-learning

Definition

On-Policy Action-Value Function

The on-policy action-value function $Q^{π} (s, a)$ gives the expected return if you start in state $s$ , take and arbitrary action $a$ (not necessarily from the policy), and then forever after act according to policy $π$ :
$Q^{π} (s, a) = E_{τ \sim π} [R (τ) ∣ s_{0} = s, a_{0} = a]$

Backlinks

Advantage Function
Policy Gradient Theorem
Proximal Policy Optimisation

Created with Quartz v4.5.2 © 2026

GitHub