algorithms.util.action_selector¶

This module defines several classes to select actions (ActionSelectors).

An ActionSelector takes a list of interests (e.g., Q-Values) and the time step, to return a single identifier, which is considered the selected action. They target the exploration-exploitation dilemma.

We consider 2 selectors:

the Epsilon-Greedy selector selects the maximum interest action with a (1-ε) probability, e.g., 95%. Otherwise, it selects a random action.
the Boltzmann selector applies a Boltzmann distribution over the interests. Interests that are closer have a similar probability, and higher interests yield higher probabilities. The distribution is controlled by a Boltzmann temperature, such that low interests can still yield significant probabilities.

members:

Classes

`ActionSelector`()
`BoltzmannActionSelector`(initial_tau, ...)	Implements the Boltzmann policy.
`EpsilonGreedyActionSelector`([epsilon])	Implements the ε-greedy policy.