reinforcement-learning gymnasium
Definition
CartPole-v1
CartPole-v1 is a reinforcement learning benchmark environment from the Gymnasium classic-control suite. A pole is attached by an unactuated hinge to a cart that moves along a frictionless track; the agent balances the pole upright by pushing the cart left or right.
It is a fully observable, known, static environment with a continuous state and a discrete action space, introduced by Barto, Sutton, and Anderson (1983).
Observation space
The observation is a vector of four real values describing the full state:
| Index | Component | Range |
|---|---|---|
| 0 | cart position | |
| 1 | cart velocity | |
| 2 | pole angle | rad () |
| 3 | pole angular velocity |
These are observation bounds, not the region the episode is allowed to enter: the state is reset once the pole or cart crosses its termination threshold (below).
Action space
The action space is Discrete(2):
0— push cart left,1— push cart right.
The applied force is fixed in magnitude; only its direction is chosen. The effect on the cart depends on the pole’s angle, since the pole’s centre of gravity shifts how much force is needed to move the cart beneath it.
Reward
Each step earns a reward of , including the terminating step, so the return of an episode equals its length.
Episode end
The episode ends on any of:
- Termination — pole angle leaves rad ();
- Termination — cart position leaves ;
- Truncation — episode length reaches steps.
The initial state is drawn uniformly in for each component, so every episode starts near upright with a small random perturbation. The -step truncation caps the per-episode return at , which Gymnasium registers as the reward threshold for v1 ( for v0).