Definition
Upper Confidence Bound
The upper confidence bound is used for the multi-armed bandit problem. It focuses focuses on exploration and exploitation based on a confidence boundary that the algorithm assigns to each machine on each round of exploration. These boundary decreases when a machine is used more in comparison to other machines.
Exploitation vs. Exploration
Exploitation
Exploitation
Exploitation involves making a decision regarding the next step from its prior experiences.
Exploration
Exploration
Exploration involves choosing a new state that the agent hasn’t chosen or has chosen fewer times till then.