smartgrid.wrappers.reward_aggregator.WeightedSumRewardAggregator¶

class smartgrid.wrappers.reward_aggregator.WeightedSumRewardAggregator(env: SmartGrid, coefficients: dict | None = None)[source]¶

Scalarizes multiple rewards through a weighted sum.

By default, coefficients are all equal to 1/n where n is the number of rewards, i.e., this is equivalent to an average.

__init__(env: SmartGrid, coefficients: dict | None = None)[source]¶

Construct an instance of the Weighted Sum aggregator.

Parameters:

env – The instance of the Smart Grid environment.
coefficients – A dictionary describing the coefficients to use for each reward function. The keys must correspond to the name of the reward functions in the env (see its SmartGrid.reward_calculator), and the values must be the weights (floats). Usually, the sum of weights is set to 1.0 to obtain a weighted average, but this is not mandatory. By default, weights are set to 1 / n to obtain a simple average.

Methods

`__init__`(env[, coefficients])	Construct an instance of the Weighted Sum aggregator.
`class_name`()	Returns the class name of the wrapper.
`close`()	Closes the wrapper and `env`.
`get_wrapper_attr`(name)	Gets an attribute from the wrapper and lower environments if name doesn't exist in this object.
`render`()	Uses the `render()` of the `env` that can be overwritten to change the returned data.
`reset`(*[, seed, options])	Uses the `reset()` of the `env` that can be overwritten to change the returned data.
`reward`(rewards)	Transform multi-objective rewards into single-objective rewards.
`step`(action)	Modifies the `env` `step()` reward using `self.reward()`.
`wrapper_spec`(**kwargs)	Generates a WrapperSpec for the wrappers.

Attributes

`action_space`	Return the `Env` `action_space` unless overwritten then the wrapper `action_space` is used.
`metadata`	Returns the `Env` `metadata`.
`np_random`	Returns the `Env` `np_random` attribute.
`observation_space`	Return the `Env` `observation_space` unless overwritten then the wrapper `observation_space` is used.
`render_mode`	Returns the `Env` `render_mode`.
`reward_range`	Return the `Env` `reward_range` unless overwritten then the wrapper `reward_range` is used.
`spec`	Returns the `Env` `spec` attribute with the WrapperSpec if the wrapper inherits from EzPickle.
`unwrapped`	Returns the base environment of the wrapper.

property _np_random¶

This code will never be run due to __getattr__ being called prior this.

It seems that @property overwrites the variable (_np_random) meaning that __getattr__ gets called with the missing variable.

property action_space: Space[ActType] | Space[WrapperActType]¶: Return the Env action_space unless overwritten then the wrapper action_space is used.

get_wrapper_attr(name: str) → Any¶

Gets an attribute from the wrapper and lower environments if name doesn’t exist in this object.

property observation_space: Space[ObsType] | Space[WrapperObsType]¶: Return the Env observation_space unless overwritten then the wrapper observation_space is used.

render() → RenderFrame | list[RenderFrame] | None¶: Uses the render() of the env that can be overwritten to change the returned data.

reset(*, seed: int | None = None, options: dict[str, Any] | None = None) → tuple[WrapperObsType, dict[str, Any]]¶: Uses the reset() of the env that can be overwritten to change the returned data.

Transform multi-objective rewards into single-objective rewards.

Parameters:: rewards – A list of dicts, one dict for each learning agent. Each dict contains one or several rewards, indexed by their reward function’s name, e.g., { 'fct1': 0.8, 'fct2': 0.4 }.
Returns:: A list of scalar rewards, one for each agent. The rewards are scalarized from the dict.

property reward_range: tuple[SupportsFloat, SupportsFloat]¶: Return the Env reward_range unless overwritten then the wrapper reward_range is used.

property spec: EnvSpec | None¶: Returns the Env spec attribute with the WrapperSpec if the wrapper inherits from EzPickle.

step(action: ActType) → tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]]¶: Modifies the env step() reward using self.reward().

property unwrapped: Env[ObsType, ActType]¶

Returns the base environment of the wrapper.

This will be the bare gymnasium.Env environment, underneath all layers of wrappers.

classmethod wrapper_spec(**kwargs: Any) → WrapperSpec¶: Generates a WrapperSpec for the wrappers.