algorithms.util.action_selector

This module defines several classes to select actions (ActionSelectors).

An ActionSelector takes a list of interests (e.g., Q-Values) and the time step, to return a single identifier, which is considered the selected action. They target the exploration-exploitation dilemma.

We consider 2 selectors:

  • the Epsilon-Greedy selector selects the maximum interest action with a (1-ε) probability, e.g., 95%. Otherwise, it selects a random action.

  • the Boltzmann selector applies a Boltzmann distribution over the interests. Interests that are closer have a similar probability, and higher interests yield higher probabilities. The distribution is controlled by a Boltzmann temperature, such that low interests can still yield significant probabilities.

members:

Classes

ActionSelector()

BoltzmannActionSelector(initial_tau, ...)

Implements the Boltzmann policy.

EpsilonGreedyActionSelector([epsilon])

Implements the ε-greedy policy.