algorithms.util.action_selector¶
This module defines several classes to select actions (ActionSelectors).
An ActionSelector takes a list of interests (e.g., Q-Values) and the time step, to return a single identifier, which is considered the selected action. They target the exploration-exploitation dilemma.
We consider 2 selectors:
the Epsilon-Greedy selector selects the maximum interest action with a (1-ε) probability, e.g., 95%. Otherwise, it selects a random action.
the Boltzmann selector applies a Boltzmann distribution over the interests. Interests that are closer have a similar probability, and higher interests yield higher probabilities. The distribution is controlled by a Boltzmann temperature, such that low interests can still yield significant probabilities.
- members:
Classes
|
Implements the Boltzmann policy. |
|
Implements the ε-greedy policy. |