Definition
Policy Gradient
Given a parameterised stochastic policy , the policy gradient is the gradient of the expected return
with respect to the policy parameters:
By the policy gradient theorem, it can be estimated as
where is an advantage estimate. Therefore, actions with positive advantage are made more likely, while actions with negative advantage are made less likely.