smartgrid.rewards.numeric.differentiated.over_consumption.OverConsumption

class smartgrid.rewards.numeric.differentiated.over_consumption.OverConsumption[source]

Bases: Reward

Reward representing the over-consumption of an Agent.

The over-consumption is the quantity of energy that was consumed by the society of agents, but which was not available in the grid. (We assume that the grid automatically buys from the national grid to compensate, which has a negative impact. Over-consumption should thus be avoided).

We compare the quantity of energy taken (i.e., consumed + stored from the grid) by all agents to the quantity of energy over-consumed by all agents. This gives us a global component (the current environment).

Then, we compare the quantity of energy over-consumed, minus the agent’s taken energy, and we compare to the sum of energy taken by all agents. This gives us a local component (the hypothetical environment, had the agent not acted).

The reward follows the Difference Reward principle, and thus is the global component minus the local component.

__init__()[source]

Methods

__init__()

calculate(world, agent)

Compute the reward for a specific Agent at the current time step.

reset()

Reset the reward function.

Attributes

name

Uniquely identifying, human-readable name for this reward function.

calculate(world: World, agent: Agent)[source]

Compute the reward for a specific Agent at the current time step.

Parameters:
  • world – The World, used to get the current state and determine consequences of the agent’s action.

  • agent – The Agent that is rewarded, used to access particular information about the agent (personal state) and its action.

Returns:

A reward, i.e., a single value describing how well the agent performed. The higher the reward, the better its action was. Typically, a value in [0,1] but any range can be used.

name: str

Uniquely identifying, human-readable name for this reward function.

reset()

Reset the reward function.

This function must be overridden by reward functions that use a state, so that the state is reset with the environment. By default, does nothing, as most reward functions do not use a state.