reinforcement-learning

Definition

Upper Confidence Bound

The upper confidence bound is used for the multi-armed bandit problem. It focuses focuses on exploration and exploitation based on a confidence boundary that the algorithm assigns to each machine on each round of exploration. These boundary decreases when a machine is used more in comparison to other machines.

Exploitation vs. Exploration

Exploitation

Exploitation

Exploitation involves making a decision regarding the next step from its prior experiences.

Exploration

Exploration

Exploration involves choosing a new state that the agent hasn’t chosen or has chosen fewer times till then.