Lukas' Notes

❯

❯

Upper Confidence Bound

Upper Confidence Bound

Jul 25, 20251 min read

reinforcement-learning

Definition

Upper Confidence Bound

The upper confidence bound is used for the multi-armed bandit problem. It focuses focuses on exploration and exploitation based on a confidence boundary that the algorithm assigns to each machine on each round of exploration. These boundary decreases when a machine is used more in comparison to other machines.

Exploitation vs. Exploration

Exploitation

Exploitation

Exploitation involves making a decision regarding the next step from its prior experiences.

Exploration

Exploration

Exploration involves choosing a new state that the agent hasn’t chosen or has chosen fewer times till then.

Graph View

Definition
Exploitation vs. Exploration
Exploitation
Exploration

Created with Quartz v4.4.0 © 2025

GitHub