smartgrid.wrappers.reward_aggregator.RewardAggregator¶

class smartgrid.wrappers.reward_aggregator.RewardAggregator(env: SmartGrid)[source]¶

Bases: ABC, RewardWrapper

Wraps the multi-objective env into a single-objective by aggregating rewards.

The smartgrid.environment.SmartGrid environment supports multiple reward functions; its SmartGrid.step() method returns a list of dictionaries, one dict for each agent, containing the rewards indexed by their reward function’s name. However, most Reinforcement Learning algorithms expect a scalar reward, or in this case, a list of scalar rewards, one for each agent.

Classes that extend the RewardAggregator bridge this gap, by aggregating (scalarizing) the multiple rewards into a single one.

__init__(env: SmartGrid)[source]¶

Methods

`__init__`(env)
`class_name`()	Returns the class name of the wrapper.
`close`()	Closes the wrapper and `env`.
`get_wrapper_attr`(name)	Gets an attribute from the wrapper and lower environments if name doesn't exist in this object.
`render`()	Uses the `render()` of the `env` that can be overwritten to change the returned data.
`reset`(*[, seed, options])	Uses the `reset()` of the `env` that can be overwritten to change the returned data.
`reward`(rewards)	Transform multi-objective rewards into single-objective rewards.
`step`(action)	Modifies the `env` `step()` reward using `self.reward()`.
`wrapper_spec`(**kwargs)	Generates a WrapperSpec for the wrappers.

Attributes

`action_space`	Return the `Env` `action_space` unless overwritten then the wrapper `action_space` is used.
`metadata`	Returns the `Env` `metadata`.
`np_random`	Returns the `Env` `np_random` attribute.
`observation_space`	Return the `Env` `observation_space` unless overwritten then the wrapper `observation_space` is used.
`render_mode`	Returns the `Env` `render_mode`.
`reward_range`	Return the `Env` `reward_range` unless overwritten then the wrapper `reward_range` is used.
`spec`	Returns the `Env` `spec` attribute with the WrapperSpec if the wrapper inherits from EzPickle.
`unwrapped`	Returns the base environment of the wrapper.

property _np_random¶

This code will never be run due to __getattr__ being called prior this.

It seems that @property overwrites the variable (_np_random) meaning that __getattr__ gets called with the missing variable.

property action_space: Space[ActType] | Space[WrapperActType]¶: Return the Env action_space unless overwritten then the wrapper action_space is used.

classmethod class_name() → str¶: Returns the class name of the wrapper.

close()¶: Closes the wrapper and env.

get_wrapper_attr(name: str) → Any¶

Gets an attribute from the wrapper and lower environments if name doesn’t exist in this object.

Args:: name: The variable name to get
Returns:: The variable with name in wrapper or lower environments

property metadata: dict[str, Any]¶: Returns the Env metadata.

property np_random: Generator¶: Returns the Env np_random attribute.

property observation_space: Space[ObsType] | Space[WrapperObsType]¶: Return the Env observation_space unless overwritten then the wrapper observation_space is used.

render() → RenderFrame | list[RenderFrame] | None¶: Uses the render() of the env that can be overwritten to change the returned data.

property render_mode: str | None¶: Returns the Env render_mode.

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → tuple[WrapperObsType, dict[str, Any]]¶: Uses the reset() of the env that can be overwritten to change the returned data.

abstract reward(rewards: List[Dict[str, float]]) → List[float][source]¶

Transform multi-objective rewards into single-objective rewards.

Parameters:: rewards – A list of dicts, one dict for each learning agent. Each dict contains one or several rewards, indexed by their reward function’s name, e.g., { 'fct1': 0.8, 'fct2': 0.4 }.
Returns:: A list of scalar rewards, one for each agent. The rewards are scalarized from the dict.

property reward_range: tuple[SupportsFloat, SupportsFloat]¶: Return the Env reward_range unless overwritten then the wrapper reward_range is used.

property spec: EnvSpec | None¶: Returns the Env spec attribute with the WrapperSpec if the wrapper inherits from EzPickle.

step(action: ActType) → tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]]¶: Modifies the env step() reward using self.reward().

property unwrapped: Env[ObsType, ActType]¶

Returns the base environment of the wrapper.

This will be the bare gymnasium.Env environment, underneath all layers of wrappers.

classmethod wrapper_spec(**kwargs: Any) → WrapperSpec¶: Generates a WrapperSpec for the wrappers.