Definition
Return
Let be a trajectory sampled under policy . The return from timestep is the discounted sum of future rewards:
where is the discount factor.
Return
Let be a trajectory sampled under policy . The return from timestep is the discounted sum of future rewards:
where is the discount factor.