smartgrid.rewards.numeric.differentiated.multi_objective_product.MultiObjectiveProduct

class smartgrid.rewards.numeric.differentiated.multi_objective_product.MultiObjectiveProduct[source]

Bases: Reward

Product of multiple objectives: comfort, and over-consumption.

The reward is equal to comfort * overconsumption, where comfort refers to the reward of Comfort, and overconsumption refers to the reward of OverConsumption.

Note

The overconsumption is interpolated from [-1, 1] to [0, 1] to use the same range as the comfort, and avoid “semantic” problems, e.g., -0.9 * 0.1 = -0.09, where -0.09 is actually better than -0.9, although both rewards were very low.

__init__()[source]

Methods

__init__()

calculate(world, agent)

Compute the reward for a specific Agent at the current time step.

reset()

Reset the reward function.

Attributes

name

Uniquely identifying, human-readable name for this reward function.

calculate(world, agent)[source]

Compute the reward for a specific Agent at the current time step.

Parameters:
  • world – The World, used to get the current state and determine consequences of the agent’s action.

  • agent – The Agent that is rewarded, used to access particular information about the agent (personal state) and its action.

Returns:

A reward, i.e., a single value describing how well the agent performed. The higher the reward, the better its action was. Typically, a value in [0,1] but any range can be used.

name: str

Uniquely identifying, human-readable name for this reward function.

reset()

Reset the reward function.

This function must be overridden by reward functions that use a state, so that the state is reset with the environment. By default, does nothing, as most reward functions do not use a state.