smartgrid.rewards.reward_constraints.TimeConstrainedReward¶

class smartgrid.rewards.reward_constraints.TimeConstrainedReward(base_reward: Reward, start_step: int | None = None, end_step: int | None = None)[source]¶

Bases: Reward

Enable or disable a reward function based on the current time step.

This constraint can be used to specify a starting point, before which the reward function is disabled and does not produce rewards, and an ending point, after which the reward function is disabled and does not produce rewards.

This allows, for example, adding progressively reward functions: let us assume 3 reward functions A (always active), B (active after 10 steps), and C (active after 20 steps), learning agents will first only receive rewards from A, then from A and B, and finally from all three. This simulates the “evolution” of the ethical considerations that are embedded within the reward functions, as if this was a single reward whose definition changes along the time.

__init__(base_reward: Reward, start_step: int | None = None, end_step: int | None = None)[source]¶

Methods

`__init__`(base_reward[, start_step, end_step])
`calculate`(world, agent)	Compute the reward for a specific Agent at the current time step.
`is_activated`(world, agent)	Determines whether the reward function should produce a reward.
`reset`()	Reset the reward function.

Attributes

`base_reward`	The "base" reward function that we want to constrain.
`start_step`	Optional starting point of the reward, i.e., when the reward becomes enabled.
`end_step`	Optional end point of the reward, i.e., when the reward becomes disabled.
`name`	Uniquely identifying, human-readable name for this reward function.

base_reward: Reward¶

The “base” reward function that we want to constrain.

It is used to compute the reward when required. Note that it can be another type of constraint, so as to “combine” constraints as some sort of chain leading to the base reward function.

calculate(world: World, agent: Agent) → float[source]¶

Compute the reward for a specific Agent at the current time step.

Parameters:

world – The World, used to get the current state and determine consequences of the agent’s action.
agent – The Agent that is rewarded, used to access particular information about the agent (personal state) and its action.

Returns:

A reward, i.e., a single value describing how well the agent performed. The higher the reward, the better its action was. Typically, a value in [0,1] but any range can be used.

end_step: int | None¶

Optional end point of the reward, i.e., when the reward becomes disabled.

This allows enabling/disabling rewards during a simulation, which creates changes in the environment, and forces agents to forget previous (ethical) considerations.

By default (None), the reward function is never disabled after becoming active, and produces rewards at each time step. If set to a (positive) integer, end_step places a constraint on the time steps at which the reward function is active: t < end_step, where t is the current_step.

See also start_step and is_activated().

is_activated(world: World, agent: Agent) → bool[source]¶

Determines whether the reward function should produce a reward.

In the TimeConstrainedReward, it resorts to simply checking whether the world’s current_step lies between the start_step and the end_step. This allows:

enabling the reward function after a certain time, e.g., start_step = 2000 means that this reward function will only produce rewards from the 2000th time step;
disabling the reward function after a certain time, e.g., end_step = 6000 means that this reward function will only produce rewards before the 6000th time step;
mixtures of starting and ending times, with the constraint that start_step <= end_step (otherwise, the reward function cannot possibly be activated at any time step).

Parameters:

world – The World in which the reward function may be activated; used primarily to obtain the current time step.
agent – The Agent that should (potentially) be rewarded by this reward function; not used in this subclass, but required by the base signature.

Returns:

A boolean indicating whether the reward function should produce a reward at this moment, based on the current time step.

name: str¶: Uniquely identifying, human-readable name for this reward function.

reset()[source]¶

Reset the reward function.

This function must be overridden by reward functions that use a state, so that the state is reset with the environment. By default, does nothing, as most reward functions do not use a state.

start_step: int | None¶

Optional starting point of the reward, i.e., when the reward becomes enabled.

This allows enabling/disabling rewards during a simulation, which creates changes in the environment, and forces agents to adapt to new (ethical) considerations.

By default (None), the reward function is initially active, and produces rewards at the beginning of the simulation. If set to a (positive) integer, start_step places a constraint on the time steps at which the reward function becomes active: start_step <= t, where t is the current_step.

See also end_step and is_activated().