smartgrid.wrappers.reward_aggregator.RewardAggregator

class smartgrid.wrappers.reward_aggregator.RewardAggregator(env: SmartGrid)[source]

Bases: ABC, RewardWrapper

Wraps the multi-objective env into a single-objective by aggregating rewards.

The smartgrid.environment.SmartGrid environment supports multiple reward functions; its SmartGrid.step() method returns a list of dictionaries, one dict for each agent, containing the rewards indexed by their reward function’s name. However, most Reinforcement Learning algorithms expect a scalar reward, or in this case, a list of scalar rewards, one for each agent.

Classes that extend the RewardAggregator bridge this gap, by aggregating (scalarizing) the multiple rewards into a single one.

__init__(env: SmartGrid)[source]

Methods

__init__(env)

class_name()

Returns the class name of the wrapper.

close()

Closes the wrapper and env.

get_wrapper_attr(name)

Gets an attribute from the wrapper and lower environments if name doesn't exist in this object.

render()

Uses the render() of the env that can be overwritten to change the returned data.

reset(*[, seed, options])

Uses the reset() of the env that can be overwritten to change the returned data.

reward(rewards)

Transform multi-objective rewards into single-objective rewards.

step(action)

Modifies the env step() reward using self.reward().

wrapper_spec(**kwargs)

Generates a WrapperSpec for the wrappers.

Attributes

action_space

Return the Env action_space unless overwritten then the wrapper action_space is used.

metadata

Returns the Env metadata.

np_random

Returns the Env np_random attribute.

observation_space

Return the Env observation_space unless overwritten then the wrapper observation_space is used.

render_mode

Returns the Env render_mode.

reward_range

Return the Env reward_range unless overwritten then the wrapper reward_range is used.

spec

Returns the Env spec attribute with the WrapperSpec if the wrapper inherits from EzPickle.

unwrapped

Returns the base environment of the wrapper.

property _np_random

This code will never be run due to __getattr__ being called prior this.

It seems that @property overwrites the variable (_np_random) meaning that __getattr__ gets called with the missing variable.

property action_space: Space[ActType] | Space[WrapperActType]

Return the Env action_space unless overwritten then the wrapper action_space is used.

classmethod class_name() str

Returns the class name of the wrapper.

close()

Closes the wrapper and env.

get_wrapper_attr(name: str) Any

Gets an attribute from the wrapper and lower environments if name doesn’t exist in this object.

Args:

name: The variable name to get

Returns:

The variable with name in wrapper or lower environments

property metadata: dict[str, Any]

Returns the Env metadata.

property np_random: Generator

Returns the Env np_random attribute.

property observation_space: Space[ObsType] | Space[WrapperObsType]

Return the Env observation_space unless overwritten then the wrapper observation_space is used.

render() RenderFrame | list[RenderFrame] | None

Uses the render() of the env that can be overwritten to change the returned data.

property render_mode: str | None

Returns the Env render_mode.

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[WrapperObsType, dict[str, Any]]

Uses the reset() of the env that can be overwritten to change the returned data.

abstract reward(rewards: List[Dict[str, float]]) List[float][source]

Transform multi-objective rewards into single-objective rewards.

Parameters:

rewards – A list of dicts, one dict for each learning agent. Each dict contains one or several rewards, indexed by their reward function’s name, e.g., { 'fct1': 0.8, 'fct2': 0.4 }.

Returns:

A list of scalar rewards, one for each agent. The rewards are scalarized from the dict.

property reward_range: tuple[SupportsFloat, SupportsFloat]

Return the Env reward_range unless overwritten then the wrapper reward_range is used.

property spec: EnvSpec | None

Returns the Env spec attribute with the WrapperSpec if the wrapper inherits from EzPickle.

step(action: ActType) tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]]

Modifies the env step() reward using self.reward().

property unwrapped: Env[ObsType, ActType]

Returns the base environment of the wrapper.

This will be the bare gymnasium.Env environment, underneath all layers of wrappers.

classmethod wrapper_spec(**kwargs: Any) WrapperSpec

Generates a WrapperSpec for the wrappers.