Advantage Estimate

Definition

Advantage Estimate

An advantage estimate $\hat{A}_{t}$ is an empirical approximation of the advantage function $A^{π} (s_{t}, a_{t})$ , computed from sampled trajectories.

Common estimators:

TD residual: $\hat{A}_{t} = δ_{t} = r_{t} + γV (s_{t + 1}) - V (s_{t})$ .

GAE: $\hat{A}_{t} = \sum_{l = 0}^{\infty} (γλ)^{l} δ_{t + l}$ , interpolating between low-variance ( $λ = 0$ , TD) and low-bias ( $λ = 1$ ).

The estimate enters the policy gradient as the weight multiplying $\nabla_{θ} lo g π_{θ} (a_{t} ∣ s_{t})$ .

Lukas' Notes

Advantage Estimate

Definition

Backlinks