Advantage Function

Definition

Advantage Function

Let $π$ be a policy, let $V^{π} (s)$ be the on-policy value function, and let $Q^{π} (s, a)$ be the on-policy action-value function.

The advantage function $A^{π} (s, a)$ quantifies how much better taking action $a$ in state $s$ is compared to following $π$ on average:
$A^{π} (s, a) = Q^{π} (s, a) - V^{π} (s) .$
Using the advantage instead of the raw return as a learning signal reduces variance without introducing bias, since $E_{a \sim π (\cdot ∣ s)} [A^{π} (s, a)] = 0$ .

Lukas' Notes

Advantage Function

Definition

Backlinks