Lukas' Notes

reinforcement-learning

Definition

Advantage Function

Let be a policy, let be the on-policy value function, and let be the on-policy action-value function.

The advantage function quantifies how much better taking action in state is compared to following on average:

Using the advantage instead of the raw return as a learning signal reduces variance without introducing bias, since .