algorithms.qsom.qsom_agent.QsomAgent¶

class algorithms.qsom.qsom_agent.QsomAgent(observation_space: Box, action_space: Box, state_som: SOM, action_som: SOM, action_selector: ActionSelector, action_perturbator: ActionPerturbator, q_learning_rate=0.7, q_discount_factor=0.9, update_all=True, use_neighborhood=True)[source]¶

Bases: object

__init__(observation_space: Box, action_space: Box, state_som: SOM, action_som: SOM, action_selector: ActionSelector, action_perturbator: ActionPerturbator, q_learning_rate=0.7, q_discount_factor=0.9, update_all=True, use_neighborhood=True)[source]¶: Initialize an Agent using the Q-SOM learning and decision algorithm.

Methods

`__init__`(observation_space, action_space, ...)	Initialize an Agent using the Q-SOM learning and decision algorithm.
`backward`(new_perception, reward)
`forward`(observations)
`get_optimal_action`(observations)	Return the action that is considered optimal.

_interpolate_action(action: ndarray)[source]¶

Interpolate action from the [0,1]^n space to their space.

Similarly to the observations, it is easier for SOMs to handle actions constrained to the [0,1]^n space. However, since actions are produced by SOMs, we interpolate in the other direction.

_interpolate_observations(observations: ndarray)[source]¶

Interpolate observations from their space to the [0,1]^n space.

It is easier for SOMs to handle values constrained to the [0,1]^n space, thus we need to interpolate them from any (bounded) space. For example, if the original space is [0,100]x[0,200], the value [40, 150] interpolated in [0,1]^2 (or [0,1]x[0,1]) is: [0.4, 0.75]. The original observation space is known to this agent as the self.observation_space attribute.

get_optimal_action(observations)[source]¶

Return the action that is considered optimal.

This method ignores the exploration part: we select the action with the maximal Q-Value (no Boltzmann selection), and we do not perturb it (no random noise).