Lukas' Notes

Return

May 27, 20261 min read

reinforcement-learning

Definition

Return

Let $(s_{t}, a_{t}, r_{t})_{t \geq 0}$ be a trajectory sampled under policy $π$ . The return from timestep $t$ is the discounted sum of future rewards:
$G_{t} = k = 0 \sum \infty γ^{k} R (s_{t + k}, a_{t + k})$
where $γ \in [0, 1]$ is the discount factor.

Backlinks

Advantage Function
Discount Factor
Expected Return
On-Policy Action-Value Function
Policy Gradient Theorem
Proximal Policy Optimisation
Value Function

Created with Quartz v4.5.2 © 2026

GitHub