smartgrid.wrappers.reward_aggregator.WeightedSumRewardAggregator¶
- class smartgrid.wrappers.reward_aggregator.WeightedSumRewardAggregator(env: SmartGrid, coefficients: dict | None = None)[source]¶
Bases:
RewardAggregator
Scalarizes multiple rewards through a weighted sum.
By default, coefficients are all equal to
1/n
wheren
is the number of rewards, i.e., this is equivalent to an average.- __init__(env: SmartGrid, coefficients: dict | None = None)[source]¶
Construct an instance of the Weighted Sum aggregator.
- Parameters:
env – The instance of the Smart Grid environment.
coefficients – A dictionary describing the coefficients to use for each reward function. The keys must correspond to the name of the reward functions in the env (see its
SmartGrid.reward_calculator
), and the values must be the weights (floats). Usually, the sum of weights is set to1.0
to obtain a weighted average, but this is not mandatory. By default, weights are set to1 / n
to obtain a simple average.
Methods
__init__
(env[, coefficients])Construct an instance of the Weighted Sum aggregator.
Returns the class name of the wrapper.
close
()Closes the wrapper and
env
.get_wrapper_attr
(name)Gets an attribute from the wrapper and lower environments if name doesn't exist in this object.
render
()Uses the
render()
of theenv
that can be overwritten to change the returned data.reset
(*[, seed, options])Uses the
reset()
of theenv
that can be overwritten to change the returned data.reward
(rewards)Transform multi-objective rewards into single-objective rewards.
step
(action)Modifies the
env
step()
reward usingself.reward()
.wrapper_spec
(**kwargs)Generates a WrapperSpec for the wrappers.
Attributes
Return the
Env
action_space
unless overwritten then the wrapperaction_space
is used.Returns the
Env
metadata
.Returns the
Env
np_random
attribute.Return the
Env
observation_space
unless overwritten then the wrapperobservation_space
is used.Returns the
Env
render_mode
.Return the
Env
reward_range
unless overwritten then the wrapperreward_range
is used.Returns the
Env
spec
attribute with the WrapperSpec if the wrapper inherits from EzPickle.Returns the base environment of the wrapper.
- property _np_random¶
This code will never be run due to __getattr__ being called prior this.
It seems that @property overwrites the variable (_np_random) meaning that __getattr__ gets called with the missing variable.
- property action_space: Space[ActType] | Space[WrapperActType]¶
Return the
Env
action_space
unless overwritten then the wrapperaction_space
is used.
- close()¶
Closes the wrapper and
env
.
- get_wrapper_attr(name: str) Any ¶
Gets an attribute from the wrapper and lower environments if name doesn’t exist in this object.
- Args:
name: The variable name to get
- Returns:
The variable with name in wrapper or lower environments
- property observation_space: Space[ObsType] | Space[WrapperObsType]¶
Return the
Env
observation_space
unless overwritten then the wrapperobservation_space
is used.
- render() RenderFrame | list[RenderFrame] | None ¶
Uses the
render()
of theenv
that can be overwritten to change the returned data.
- property render_mode: str | None¶
Returns the
Env
render_mode
.
- reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[WrapperObsType, dict[str, Any]] ¶
Uses the
reset()
of theenv
that can be overwritten to change the returned data.
- reward(rewards: List[Dict[str, float]]) List[float] [source]¶
Transform multi-objective rewards into single-objective rewards.
- Parameters:
rewards – A list of dicts, one dict for each learning agent. Each dict contains one or several rewards, indexed by their reward function’s name, e.g.,
{ 'fct1': 0.8, 'fct2': 0.4 }
.- Returns:
A list of scalar rewards, one for each agent. The rewards are scalarized from the dict.
- property reward_range: tuple[SupportsFloat, SupportsFloat]¶
Return the
Env
reward_range
unless overwritten then the wrapperreward_range
is used.
- property spec: EnvSpec | None¶
Returns the
Env
spec
attribute with the WrapperSpec if the wrapper inherits from EzPickle.
- step(action: ActType) tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]] ¶
Modifies the
env
step()
reward usingself.reward()
.
- property unwrapped: Env[ObsType, ActType]¶
Returns the base environment of the wrapper.
This will be the bare
gymnasium.Env
environment, underneath all layers of wrappers.
- classmethod wrapper_spec(**kwargs: Any) WrapperSpec ¶
Generates a WrapperSpec for the wrappers.