Lukas' Notes

Expected Return

May 27, 20261 min read

reinforcement-learning

Definition

Expected Return

Let $π_{θ}$ be a parameterised policy and let $ρ_{0}$ be the start-state distribution. The expected return is the objective maximised in reinforcement learning:
$J (θ) = E_{s_{0} \sim ρ_{0}, τ \sim π_{θ}} [G_{0}]$
where $G_{0} = \sum_{k = 0}^{\infty} γ^{k} R (s_{k}, a_{k})$ is the return from the initial state and $γ \in [0, 1]$ is the discount factor.

Backlinks

On-Policy Value Function
Policy Gradient Theorem
Policy Gradient

Created with Quartz v4.5.2 © 2026

GitHub