algorithms.naive.random_model.RandomModel¶

class algorithms.naive.random_model.RandomModel(env, hyper_parameters: dict)[source]¶

Bases: Model

Model that returns purely random actions.

The actions are based on the action_space for each agent, using the Space.sample() method.

__init__(env, hyper_parameters: dict)[source]¶

Create a Model, i.e., an entrypoint for the learning algorithm.

Parameters:

env – The environment that the learning algorithm will interact with. This is useful for, e.g., accessing the agents’ observations and actions spaces, knowing the number of agents, etc. Note that a Wrapper can also be used, such as a RewardAggregator.
hyper_parameters – An optional dictionary of hyper-parameters that control the creation of the learning agents. For example, the learning rate to use, etc. The hyper-parameters themselves are specific to the implemented Model.

Methods

`__init__`(env, hyper_parameters)	Create a Model, i.e., an entrypoint for the learning algorithm.
`backward`(observations_per_agent, ...)	Learn (improve) the agents' policies, based on observations and rewards.
`forward`(observations_per_agent)	Decide which actions should be taken, based on observations.
`get_optimal_actions`(observations_per_agent)	Return the actions that are considered optimal, for each agent.

backward(observations_per_agent, reward_per_agent)[source]¶

Learn (improve) the agents’ policies, based on observations and rewards.

This method represents the learning step.

Parameters:

observations_per_agent – The observations per agent, similar to those in the forward() method. They describe the new situation that happened after the agents’ actions were executed in the world.
reward_per_agent – The rewards per agent. They describe the degree to which agents’ actions were satisfying (interesting), with respect to the moral values encoded in the reward functions. If multiple reward functions are used, this is a dict of dicts; otherwise, it is a dict of floats. See the Smartgrid._get_reward() for details

forward(observations_per_agent)[source]¶

Decide which actions should be taken, based on observations.

This method represents the decision step.

Parameters:: observations_per_agent – The observations per agent. See the SmartGrid._get_obs() method for details on its structure. These observations describe the current state of the simulator, and are the data used to take actions.
Returns:: A dict mapping each agent to its action, where an action is a list of action parameters. See the SmartGrid.action_space for details on the structure of action parameters.

get_optimal_actions(observations_per_agent: Dict[AgentID, ObsType]) → Dict[AgentID, ActionType]¶

Return the actions that are considered optimal, for each agent.

In other terms, this method ensures exploitation, whereas the forward() method encourages exploitation-exploration.

It can be useful after the training phase, for testing purposes.

Parameters:: observations_per_agent – A dictionary mapping agents’ name to their observations. Exactly as in forward().
Returns:: A dict mapping each agent to its action. Actions have the same structure as in forward(), but they should be produced with only exploitation as a goal, i.e., selecting the action that should yield the best reward.

Warning

By default, to ensure that all models will have this method, it simply returns the same actions as forward(). Models that make a distinction between exploration and exploitation should override it.