smartgrid.rewards.reward_constraints.AgentConstrainedReward¶

class smartgrid.rewards.reward_constraints.AgentConstrainedReward(base_reward: Reward, agents: List[Agent])[source]¶

Bases: Reward

Enable or disable a reward function based on learning agents.

This constraint can be used to specify which learning agents should receive rewards from a given reward function. For other learning agents, it is as if this reward function was not present.

This allows, for example, training two populations of agents with different objectives: assume 4 agents [a1, a2, a3, a4] and 2 objectives A and B, reward function A can be constrained to only produce rewards for [a1, a2], whereas B can be constrained for [a3, a4]. This simulates a population of agents with heterogeneous ethical considerations (embedded within the reward functions).

__init__(base_reward: Reward, agents: List[Agent])[source]¶

Methods

`__init__`(base_reward, agents)
`calculate`(world, agent)	Compute the reward for a specific Agent at the current time step.
`is_activated`(world, agent)	Determines whether the reward function should produce a reward.
`reset`()	Reset the reward function.

Attributes

`base_reward`	The "base" reward function that we want to constrain.
`agents`	List of agents that will receive rewards from this reward function.
`name`	Uniquely identifying, human-readable name for this reward function.

agents: List[Agent]¶

List of agents that will receive rewards from this reward function.

Other agents will not receive rewards from this function, as if it was disabled.

base_reward: Reward¶

The “base” reward function that we want to constrain.

It is used to compute the reward when required. Note that it can be another type of constraint, so as to “combine” constraints as some sort of chain leading to the base reward function.

calculate(world: World, agent: Agent) → float[source]¶

Compute the reward for a specific Agent at the current time step.

Parameters:

world – The World, used to get the current state and determine consequences of the agent’s action.
agent – The Agent that is rewarded, used to access particular information about the agent (personal state) and its action.

Returns:

A reward, i.e., a single value describing how well the agent performed. The higher the reward, the better its action was. Typically, a value in [0,1] but any range can be used.

is_activated(world: World, agent: Agent) → bool[source]¶

Determines whether the reward function should produce a reward.

In the AgentConstrainedReward, it resorts to simply checking whether the learning agent is in the list of authorized agents agents.

Parameters:

world – The World in which the reward function may be activated; not used in this subclass, but required by the base signature.
agent – The Agent that should (potentially) be rewarded by this reward function; compared to the list of authorized agents.

Returns:

A boolean indicating whether the reward function should produce a reward at this moment, based on the learning agent.

name: str¶: Uniquely identifying, human-readable name for this reward function.

reset()[source]¶

Reset the reward function.

This function must be overridden by reward functions that use a state, so that the state is reset with the environment. By default, does nothing, as most reward functions do not use a state.