ethicalgardeners.gardenersenv.GardenersEnv¶

class ethicalgardeners.gardenersenv.GardenersEnv(random_generator, grid_world, action_enum, num_iter, render_mode, action_handler, observation_strategy, reward_functions, metrics_collector, renderers)[source]¶

Bases: AECEnv

Main environment class implementing the PettingZoo AECEnv interface.

This class orchestrates the entire Ethical Gardeners simulation.

The environment is configured through a Hydra configuration object that specifies grid initialization parameters, agent settings, observation type, rendering options, and more.

metadata¶

Environment metadata for PettingZoo compatibility.

Type:: dict

random_generator¶

Random number generator for reproducible experiments.

Type:: numpy.random.RandomState

grid_world¶

The simulated 2D grid world environment.

Type:: GridWorld

prev_grid_world¶

Copy of the previous grid world state.

Type:: GridWorld

action_enum¶

Enumeration of possible actions in the environment.

Type:: _ActionEnum

possible_agents¶

List of all agent IDs in the environment.

Type:: list

agents¶

Mapping from agent IDs to Agent objects.

Type:: dict

action_handler¶

Handler for processing agent actions.

Type:: ActionHandler

observation_strategy¶

Strategy for generating agent observations.

Type:: ObservationStrategy

reward_functions¶

Functions for calculating agent rewards.

Type:: RewardFunctions

metrics_collector¶

Collector for simulation metrics.

Type:: MetricsCollector

renderers¶

List of renderer objects for visualization.

Type:: list

num_iter¶

Maximum number of iterations for the simulation.

Type:: int

render_mode¶

Current rendering mode (‘human’ or ‘none’).

Type:: str

observations¶

Current observations for all agents.

Type:: dict

rewards¶

Current rewards for all agents.

Type:: dict

terminations¶

Terminal state flags for all agents.

Type:: dict

truncations¶

Truncation flags for all agents.

Type:: dict

infos¶

Additional information for all agents.

Type:: dict

num_moves¶

Current number of moves executed in the simulation.

Type:: int

actions_in_current_turn¶

Number of actions taken in the current turn.

Type:: int

__init__(random_generator, grid_world, action_enum, num_iter, render_mode, action_handler, observation_strategy, reward_functions, metrics_collector, renderers)[source]¶

Create the Ethical Gardeners environment.

This method sets up the entire simulation environment based on the provided configuration.

Parameters:

random_generator (numpy.random.RandomState) – Random number generator for reproducibility.
grid_world (GridWorld) – The grid world representing the simulation environment.
action_enum (_ActionEnum) – Enumeration of possible actions in the environment.
num_iter (int) – Maximum number of iterations for the simulation.
render_mode (str) – Rendering mode for the environment (‘human’ or ‘none’).
action_handler (ActionHandler) – Handler for processing agent actions.
observation_strategy (ObservationStrategy) – Strategy for generating agent observations.
reward_functions (RewardFunctions) – Functions for calculating agent rewards.
metrics_collector (MetricsCollector) – Collector for simulation metrics.
renderers (list) – List of renderer objects for visualization.

Methods

`__init__`(random_generator, grid_world, ...)	Create the Ethical Gardeners environment.
`action_space`(agent_id)	Return the action space for a specific agent.
`agent_iter`([max_iter])	Yields the current agent (self.agent_selection).
`close`()	Close the environment and clean up resources.
`last`()	Return the most recent environment step information.
`observation_space`(agent_id)	Return the observation space for a specific agent.
`observe`(agent_id)	Return the current observation for a specific agent.
`render`()	Render the current state of the environment.
`reset`([seed, options])	Reset the environment to its initial state.
`state`()	State returns a global view of the environment.
`step`(action)	Execute a step in the environment for the current agent.

Attributes

`max_num_agents`
`metadata`
`num_agents`
`unwrapped`
`possible_agents`
`agents`
`observation_spaces`
`action_spaces`
`terminations`
`truncations`
`rewards`
`infos`
`agent_selection`

_accumulate_rewards() → None¶

Adds .rewards dictionary to ._cumulative_rewards dictionary.

Typically called near the end of a step() method

_clear_rewards() → None¶: Clears all items in .rewards.

_deads_step_first() → AgentID¶

Makes .agent_selection point to first terminated agent.

Stores old value of agent_selection so that _was_dead_step can restore the variable after the dead agent steps.

_get_info(agent_id, rewards)[source]¶

Generate additional information for a specific agent.

This method creates a dictionary of additional information that is provided alongside the observation and reward.

Parameters:

agent_id (str) – The ID of the agent to generate info for.
rewards (dict) – The reward components for the agent.

Returns:

Additional information for the specified agent with the following keys:

’rewards’: The reward dict for the agent containing each reward component and the total reward.

Return type:

dict

_get_observations(agent_id)[source]¶

Generate the observation for a specific agent.

This method delegates to the observation strategy to generate the appropriate observation based on the agent’s configured observation type.

Parameters:: agent_id (str) – The ID of the agent to generate the observation for.
Returns:: The observation for the specified agent.
Return type:: object

_get_rewards(agent_id, action)[source]¶

Calculate the rewards for a specific agent.

This method delegates to the reward functions to calculate the appropriate rewards based on the agent’s actions and changes in the environment.

Parameters:

agent_id (str) – The ID of the agent to calculate rewards for.
action (_ActionEnum) – The action taken by the agent.

Returns:

Dictionary of reward components and total reward with the following keys:

’total’: The mono-objective reward for the agent. Computed as the average of all reward components.

’ecology’: The ecological reward component.

’wellbeing’: The wellbeing reward component.

’biodiversity’: The biodiversity reward component.

Return type:

dict

_was_dead_step(action=None)[source]¶

Handle a step for an agent that is already terminated or truncated.

This method is called when an agent attempts to take an action after it has already reached a terminal state or the episode has been truncated. It assigns zero reward and selects the next agent.

Parameters:: action (int, optional) – The action that was attempted.

action_space(agent_id)[source]¶

Return the action space for a specific agent.

This method returns a Discrete space representing all possible actions the agent can take in the environment.

Parameters:: agent_id (str) – The ID of the agent to get the action space for.
Returns:: The action space for the specified agent.
Return type:: gymnasium.spaces.Discrete

agent_iter(max_iter: int = 9223372036854775808) → AECIterable¶

Yields the current agent (self.agent_selection).

Needs to be used in a loop where you step() each iteration.

close()[source]¶

Close the environment and clean up resources.

This method finalizes all renderers and closes the metrics_collector.

last()[source]¶

Return the most recent environment step information.

This method returns all relevant information about the most recent step taken by the current agent.

Returns:

A tuple containing:

observation (dict): The current observation. The dictionary contains:
- observation (numpy.ndarray): The agent’s view of the environment.
- action_mask (numpy.ndarray): Binary mask indicating valid actions.
reward (float): The most recent reward.
termination (bool): Whether the agent is in a terminal state.
truncation (bool): Whether the episode was truncated.
info (dict): Additional information about the agent. Refer to _get_info() for details on the returned value.

Return type:

tuple

observation_space(agent_id)[source]¶

Return the observation space for a specific agent.

This method delegates to the observation strategy to return the appropriate observation space based on the configured observation type.

Parameters:: agent_id (str) – The ID of the agent to get the observation space for.
Returns:: The observation space for the specified agent.
Return type:: gymnasium.spaces.Space

observe(agent_id)[source]¶

Return the current observation for a specific agent.

Parameters:

agent_id (str) – The ID of the agent to get the observation for.

Returns:

The observation for the specified agent, containing:

observation: The agent’s view of the environment.
action_mask: Binary mask indicating valid actions.

Return type:

dict

render()[source]¶

Render the current state of the environment.

This method uses all configured renderers to visualize the current state of the grid world and agents.

reset(seed=None, options=None)[source]¶

Reset the environment to its initial state.

This method resets the agent selector, metrics collector, move counter, and initializes the observations, rewards, terminations, truncations, and info dictionaries for all agents.

Parameters:

seed (int, optional) – Random seed for environment initialization.
options (dict, optional) – Additional options for reset customization.

Returns:

A tuple containing:

observations (dict): Initial observations for all agents.
infos (dict): Additional information for all agents.

Return type:

tuple

state() → ndarray¶

State returns a global view of the environment.

It is appropriate for centralized training decentralized execution methods like QMIX

step(action: int)[source]¶

Execute a step in the environment for the current agent.

This method processes the action for the current agent, updates the environment state, calculates rewards, generates new observations, updates metrics, and selects the next agent to act.

If all agents have taken an action in the current turn, it updates the environmental conditions (pollution, flower growth).

Parameters:: action (int) – The action to take for the current agent.
Returns:: The observation for the next agent to act.
Return type:: dict