smartgrid.rewards.numeric.differentiated.multi_objective_product.MultiObjectiveProduct¶
- class smartgrid.rewards.numeric.differentiated.multi_objective_product.MultiObjectiveProduct[source]¶
Bases:
Reward
Product of multiple objectives: comfort, and over-consumption.
The reward is equal to
comfort * overconsumption
, wherecomfort
refers to the reward ofComfort
, andoverconsumption
refers to the reward ofOverConsumption
.Note
The overconsumption is interpolated from
[-1, 1]
to[0, 1]
to use the same range as the comfort, and avoid “semantic” problems, e.g.,-0.9 * 0.1 = -0.09
, where-0.09
is actually better than-0.9
, although both rewards were very low.Methods
__init__
()calculate
(world, agent)Compute the reward for a specific Agent at the current time step.
reset
()Reset the reward function.
Attributes
Uniquely identifying, human-readable name for this reward function.
- calculate(world, agent)[source]¶
Compute the reward for a specific Agent at the current time step.
- Parameters:
world – The World, used to get the current state and determine consequences of the agent’s action.
agent – The Agent that is rewarded, used to access particular information about the agent (personal state) and its action.
- Returns:
A reward, i.e., a single value describing how well the agent performed. The higher the reward, the better its action was. Typically, a value in [0,1] but any range can be used.
- reset()¶
Reset the reward function.
This function must be overridden by reward functions that use a state, so that the state is reset with the environment. By default, does nothing, as most reward functions do not use a state.