The discount factor γ∈[0,1] weights future rewards in the return:
Gt=k=0∑∞γkR(st+k,at+k).
- γ=0: the agent is myopic, maximising only the immediate reward.
- γ→1: the agent is farsighted, valuing distant rewards almost as much as immediate ones.
- γ<1 also ensures Gt remains finite for unbounded or continuing tasks, acting as a soft horizon.