smartgrid.environment.SmartGrid

class smartgrid.environment.SmartGrid(world: World, rewards, max_step=None, obs_manager: ObservationManager | None = None)[source]

Bases: Env

The SmartGrid environment is the main entrypoint.

It simulates a smart grid containing multiple agents (prosumers: producers and consumers) who must learn to distribute and exchange energy between them, to satisfy their comfort while taking into account various ethical considerations.

This class extends the standard gym.Env in order to be easily used with different learning algorithms. However, a key feature of this environment is that multiple agents co-exist, hence some changes have been made to the standard Gym API. Notably: the action_space and observation_space are lists of Space instead of just a Space; the step() method returns list and dicts instead of single elements.

__init__(world: World, rewards, max_step=None, obs_manager: ObservationManager | None = None)[source]

Create the SmartGrid environment.

This sets most attributes of the environment, including the action_space and observation_space.

Warning

Remember that the env is not usable until you call reset() !

Parameters:
  • world – The “physical” World of the Smart Grid in which the simulation happens. The world contains the agents, the energy generator, and handles the agents’ actions.

  • rewards – The list of reward functions that should be used. Usually, a list of a single element (for single-objective RL), but multiple reward functions can be used.

  • max_step – The maximal number of steps allowed in the environment. By default, the environment never terminates on its own: the interaction loop must be stopped from the outside. If this value is set, the step() method will return truncated=True when max_step steps have been done. Subsequent calls will raise a warning.

  • obs_manager – (Optional) The ObservationManager that will be used to determine Observations at each time step. This parameter can be used to extend this process, and generate different observations. It can (and will in most cases) be left to its default value.

Returns:

An instance of SmartGrid.

Methods

__init__(world, rewards[, max_step, obs_manager])

Create the SmartGrid environment.

close()

After the user has finished using the environment, close contains the code necessary to "clean up" the environment.

get_wrapper_attr(name)

Gets the attribute name from the environment.

render([mode])

Render the current state of the simulator to the screen.

reset([seed, options])

Reset the SmartGrid to its initial state.

step(action_n)

Advance the simulation to the next step.

Attributes

agents

The list of agents contained in the environment (world).

metadata

n_agent

Number of agents contained in the environment (world).

np_random

Returns the environment's internal _np_random that if not set will initialise with a random seed.

observation_shape

The shape, i.e., number of dimensions, of the observation space.

render_mode

reward_range

spec

unwrapped

Returns the base non-wrapped environment.

action_space

The list of action spaces for all Agents.

observation_space

The list of observation spaces for all Agents.

observation_manager

The observation manager, responsible for creating observations each step.

max_step

The maximum number of steps allowed in the environment (or None by default).

reward_calculator

The RewardCollection, responsible for determining agents' rewards each step.

world

The simulated world in which the SmartGrid exists.

_get_info(reward_n)[source]

Return additional information on the world (for the current time step).

Information contain the rewards, for each agent.

Parameters:

reward_n – The list of rewards, one for each agent.

Returns:

A dict, containing an element with key rewards. This element is itself a dict, indexed by the agents’ names, and whose value is their reward.

_get_obs()[source]

Determine the observations for all agents.

Note

As a large part of the observations are shared (“global”), we use instead of the traditional list (1 obs per agent) a dict, containing:

  • global the global observations, shared by all agents;

  • local a list of local observations, one item for each agent.

Returns:

A dictionary containing global and local.

_get_reward()[source]

Determine the reward for each agent.

Rewards describe to which degree the agent’s action was appropriate, w.r.t. moral values. These moral values are encoded in the reward function(s), see smartgrid.rewards for more details on them.

Reward functions may comprise multiple objectives. In such cases, they can be aggregated so that the result is a single float (which is used by most of the decision algorithms). This behaviour (whether to aggregate, and how to aggregate) is controlled by the reward_calculator, see RewardCollection for details.

Returns:

A list of rewards, one element per agent. The element itself is a dict which contains at least one reward, indexed by the reward’s name.

action_space: List[Space]

The list of action spaces for all Agents.

property agents

The list of agents contained in the environment (world).

close()

After the user has finished using the environment, close contains the code necessary to “clean up” the environment.

This is critical for closing rendering windows, database or HTTP connections. Calling close on an already closed environment has no effect and won’t raise an error.

get_wrapper_attr(name: str) Any

Gets the attribute name from the environment.

max_step: int | None

The maximum number of steps allowed in the environment (or None by default).

As the environment is not episodic, it does not have a way to terminate (i.e., agents cannot “solve” their task nor “die”). The maximum number of steps is a way to limit the simulation and force the environment to terminate. In practice, it simply determines the truncated return value of step(). This return value, in turn, acts as a signal for the external interaction loop. By default, or when sent to None, truncated will always return false, which means that the environment can be used forever.

property n_agent

Number of agents contained in the environment (world).

property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

observation_manager: ObservationManager

The observation manager, responsible for creating observations each step.

Can be configured (extended) to return different observations.

property observation_shape

The shape, i.e., number of dimensions, of the observation space.

observation_space: List[Space]

The list of observation spaces for all Agents.

Because the observation space is in practice split between global and local observations, this might not exactly correspond, please see the _get_obs() for details.

render(mode='text')[source]

Render the current state of the simulator to the screen.

Note

No render have been configured for now. Metrics’ values can be observed directly through the object returned by step().

Parameters:

mode – Not used

Returns:

None

reset(seed=None, options=None)[source]

Reset the SmartGrid to its initial state.

This method will call the reset method on the internal objects, e.g., the World, the Agents, etc. Despite its name, it must be used first and foremost to get the initial observations.

Parameters:
  • seed – An optional seed (int) to configure the random generators and ensure reproducibility. Note: this does not change the global generators (Python random and NumPy np.random). SmartGrid components must rely on the gym.Env._np_random.

  • options – An optional dictionary of arguments to further configure the simulator. Currently unused.

Returns:

The first (initial) observations for each agent in the World.

reward_calculator: RewardCollection

The RewardCollection, responsible for determining agents’ rewards each step.

This environment has a (partial) support for multi-objective use-cases, i.e., multiple reward functions can be used at the same time. The RewardCollection is used to hold all these functions, and compute the rewards for all functions, and for all agents, at each time step. It returns a list of dicts (multiple rewards for each agent), which can be scalarized to a list of floats (single reward for each agent) by using a wrapper over this environment. See the reward_aggregator module for details.

step(action_n)[source]

Advance the simulation to the next step.

This method takes the actions’ decided by agents (learning algorithms), and sends them to the World so it can update itself based on these actions. Then, the method computes the new observations and rewards, and returns them so that agents can decide the next action.

Parameters:

action_n – The list of actions (vectors of parameters that must be coherent with the agent’s action space), one action for each agent.

Returns:

A tuple containing information about the next (new) state:

  • obs_n: A dict that contains the observations about the next state, please see _get_obs() for details about the dict contents.

  • reward_n: A list containing the rewards for each agent, please see _get_reward() for details about its content.

  • terminated_n: A list of boolean values indicating, for each agent, whether the agent is “terminated”, e.g., completed its task or failed. Currently, always set to False: agents cannot complete nor fail (this is not an episodic environment).

  • truncated_n: A list of boolean values indicating, for each agent, whether the agent should stop acting, because, e.g., the environment has run out of time. See max_step for details.

  • info_n: A dict containing additional information about the next state, please see _get_info() for details about its content.

property unwrapped: Env[ObsType, ActType]

Returns the base non-wrapped environment.

Returns:

Env: The base non-wrapped gymnasium.Env instance

world: World

The simulated world in which the SmartGrid exists.

The world is responsible for handling all agents and “physical” interactions between the smart grid elements.