algorithms.model.Model¶
- class algorithms.model.Model(env: SmartGrid, hyper_parameters: dict)[source]¶
Bases:
ABC
A Model is a class that handles the decision-making.
It must produce decisions (actions) for all agents in the environment. Using a single Model for all agents simplifies the use, e.g., it suffices to use
model = SomeModel()
andactions = model.forward(obs)
, instead of looping over all agents. However, it is not strictly necessary to use aModel
in the interaction loop: for more complex cases, different models can be used together, or functions can be used directly to produce actions, etc.The goal of this class is to provide a standard API that other learning algorithms can follow, or at least take inspiration from, and to simplify the use of learning algorithms.
- __init__(env: SmartGrid, hyper_parameters: dict)[source]¶
Create a Model, i.e., an entrypoint for the learning algorithm.
- Parameters:
env – The environment that the learning algorithm will interact with. This is useful for, e.g., accessing the agents’ observations and actions spaces, knowing the number of agents, etc. Note that a
Wrapper
can also be used, such as aRewardAggregator
.hyper_parameters – An optional dictionary of hyper-parameters that control the creation of the learning agents. For example, the learning rate to use, etc. The hyper-parameters themselves are specific to the implemented Model.
Methods
__init__
(env, hyper_parameters)Create a Model, i.e., an entrypoint for the learning algorithm.
backward
(observations_per_agent, ...)Learn (improve) the agents' policies, based on observations and rewards.
forward
(observations_per_agent)Decide which actions should be taken, based on observations.
- abstract backward(observations_per_agent: Dict[AgentID, ObsType], reward_per_agent: Dict[AgentID, Dict[str, float]] | Dict[AgentID, float])[source]¶
Learn (improve) the agents’ policies, based on observations and rewards.
This method represents the learning step.
- Parameters:
observations_per_agent – The observations per agent, similar to those in the
forward()
method. They describe the new situation that happened after the agents’ actions were executed in the world.reward_per_agent – The rewards per agent. They describe the degree to which agents’ actions were satisfying (interesting), with respect to the moral values encoded in the reward functions. If multiple reward functions are used, this is a dict of dicts; otherwise, it is a dict of floats. See the
Smartgrid._get_reward()
for details
- abstract forward(observations_per_agent: Dict[AgentID, ObsType]) Dict[AgentID, ActionType] [source]¶
Decide which actions should be taken, based on observations.
This method represents the decision step.
- Parameters:
observations_per_agent – The observations per agent. See the
SmartGrid._get_obs()
method for details on its structure. These observations describe the current state of the simulator, and are the data used to take actions.- Returns:
A dict mapping each agent to its action, where an action is a list of action parameters. See the
SmartGrid.action_space
for details on the structure of action parameters.