ethicalgardeners.rewardfunctions.RewardFunctions

class ethicalgardeners.rewardfunctions.RewardFunctions(action_enum)[source]

Bases: object

Class for computing rewards in the Ethical Gardeners environment.

This class is responsible for calculating different types of rewards for agents based on their actions in the environment. The rewards are designed to encourage ecologically beneficial behaviors, well-being, and biodiversity.

Each reward component is normalized to a range between -1 and 1.

action_enum

An enumeration of possible actions (UP, DOWN, LEFT, RIGHT, HARVEST, WAIT, PLANT_TYPE_i). Created dynamically based on the number of flower types available.

Type:

enum

__init__(action_enum)[source]

Create the RewardFunctions object.

Parameters:

action_enum (enum) – An enumeration of possible actions (UP, DOWN, LEFT, RIGHT, HARVEST, WAIT, PLANT_TYPE_i). Created dynamically based on the number of flower types available.

Methods

__init__(action_enum)

Create the RewardFunctions object.

compute_biodiversity_reward(grid_world_prev, ...)

Compute the biodiversity reward for an agent based on its action in the environment.

compute_ecology_reward(grid_world_prev, ...)

Compute the ecological reward for an agent based on its action in the environment.

compute_reward(grid_world_prev, grid_world, ...)

Compute the mono-objective reward for an agent based on its action in the environment.

compute_wellbeing_reward(grid_world_prev, ...)

Compute the well-being reward for an agent based on its action in the environment.

compute_biodiversity_reward(grid_world_prev, grid_world, agent: Agent, action)[source]

Compute the biodiversity reward for an agent based on its action in the environment.

Biodiversity rewards are calculated based on the number of different flower types planted by the agent using the Shannon-Wiener index. Compares the index before and after the planting action to determine the impact.

Parameters:
  • grid_world_prev (GridWorld) – The grid world environment before the action.

  • grid_world (GridWorld) – The grid world environment.

  • agent (Agent) – The agent performing the action.

  • action (action_enum) – The action performed.

Returns:

The normalized biodiversity reward (between -1 and 1) for planting actions, 0 for other actions.

Return type:

float

compute_ecology_reward(grid_world_prev, grid_world, agent: Agent, action)[source]

Compute the ecological reward for an agent based on its action in the environment.

For planting actions, calculates the expected future impact of pollution reduction, normalized against the maximum theoretical impact. For harvesting actions, multiply the impact the flower had on the environment before harvesting with the pollution of the cell, also normalized against the maximum. Penalizes harvesting actions only if the pollution level is above the minimum pollution level.

Parameters:
  • grid_world_prev (GridWorld) – The grid world environment before the action.

  • grid_world (GridWorld) – The grid world environment.

  • agent (Agent) – The agent performing the action.

  • action (action_enum) – The action performed.

Returns:

The normalized ecological reward (between -1 and 1) for planting and harvesting actions, 0 for other actions.

Return type:

float

compute_reward(grid_world_prev, grid_world, agent: Agent, action)[source]

Compute the mono-objective reward for an agent based on its action in the environment.

The reward is a combination of ecological, well-being, and biodiversity rewards, normalized to a range between -1 and 1.

Parameters:
  • grid_world_prev (GridWorld) – The grid world environment before the action.

  • grid_world (GridWorld) – The grid world environment.

  • agent (Agent) – The agent performing the action.

  • action (action_enum) – The action performed.

Returns:

A dictionary containing the ecological, well-being, and biodiversity rewards, as well as the total reward averaged across these components.

Return type:

dict

compute_wellbeing_reward(grid_world_prev, grid_world, agent: Agent, action)[source]

Compute the well-being reward for an agent based on its action in the environment.

Well-being rewards are calculated based on the price of the harvested flowers compared to the most expensive flower type. Penalises the agent for not earning money by giving a penalty based on the number of turns without income, normalized to a maximum penalty.

Parameters:
  • grid_world_prev (GridWorld) – The grid world environment before the action.

  • grid_world (GridWorld) – The grid world environment.

  • agent (Agent) – The agent performing the action.

  • action (action_enum) – The action performed.

Returns:

The normalized well-being reward (between -1 and 1) for harvesting actions, a penalty for other actions.

Return type:

float