ethicalgardeners.gardenersenv.GardenersEnv¶
- class ethicalgardeners.gardenersenv.GardenersEnv(random_generator, grid_world, action_enum, num_iter, render_mode, action_handler, observation_strategy, reward_functions, metrics_collector, renderers)[source]¶
Bases:
AECEnvMain environment class implementing the PettingZoo AECEnv interface.
This class orchestrates the entire Ethical Gardeners simulation.
The environment is configured through a Hydra configuration object that specifies grid initialization parameters, agent settings, observation type, rendering options, and more.
- random_generator¶
Random number generator for reproducible experiments.
- Type:
- action_enum¶
Enumeration of possible actions in the environment.
- Type:
- action_handler¶
Handler for processing agent actions.
- Type:
- observation_strategy¶
Strategy for generating agent observations.
- Type:
- reward_functions¶
Functions for calculating agent rewards.
- Type:
- metrics_collector¶
Collector for simulation metrics.
- Type:
- __init__(random_generator, grid_world, action_enum, num_iter, render_mode, action_handler, observation_strategy, reward_functions, metrics_collector, renderers)[source]¶
Create the Ethical Gardeners environment.
This method sets up the entire simulation environment based on the provided configuration.
- Parameters:
random_generator (
numpy.random.RandomState) – Random number generator for reproducibility.grid_world (
GridWorld) – The grid world representing the simulation environment.action_enum (
_ActionEnum) – Enumeration of possible actions in the environment.num_iter (int) – Maximum number of iterations for the simulation.
render_mode (str) – Rendering mode for the environment (‘human’ or ‘none’).
action_handler (
ActionHandler) – Handler for processing agent actions.observation_strategy (
ObservationStrategy) – Strategy for generating agent observations.reward_functions (
RewardFunctions) – Functions for calculating agent rewards.metrics_collector (
MetricsCollector) – Collector for simulation metrics.renderers (list) – List of renderer objects for visualization.
Methods
__init__(random_generator, grid_world, ...)Create the Ethical Gardeners environment.
action_space(agent_id)Return the action space for a specific agent.
agent_iter([max_iter])Yields the current agent (self.agent_selection).
close()Close the environment and clean up resources.
last()Return the most recent environment step information.
observation_space(agent_id)Return the observation space for a specific agent.
observe(agent_id)Return the current observation for a specific agent.
render()Render the current state of the environment.
reset([seed, options])Reset the environment to its initial state.
state()State returns a global view of the environment.
step(action)Execute a step in the environment for the current agent.
Attributes
max_num_agentsnum_agentsunwrappedobservation_spacesaction_spacesagent_selection- _accumulate_rewards() None¶
Adds .rewards dictionary to ._cumulative_rewards dictionary.
Typically called near the end of a step() method
- _deads_step_first() AgentID¶
Makes .agent_selection point to first terminated agent.
Stores old value of agent_selection so that _was_dead_step can restore the variable after the dead agent steps.
- _get_info(agent_id, rewards)[source]¶
Generate additional information for a specific agent.
This method creates a dictionary of additional information that is provided alongside the observation and reward.
- Parameters:
- Returns:
Additional information for the specified agent with the following keys:
’rewards’: The reward dict for the agent containing each reward component and the total reward.
- Return type:
- _get_observations(agent_id)[source]¶
Generate the observation for a specific agent.
This method delegates to the observation strategy to generate the appropriate observation based on the agent’s configured observation type.
- _get_rewards(agent_id, action)[source]¶
Calculate the rewards for a specific agent.
This method delegates to the reward functions to calculate the appropriate rewards based on the agent’s actions and changes in the environment.
- Parameters:
agent_id (str) – The ID of the agent to calculate rewards for.
action (
_ActionEnum) – The action taken by the agent.
- Returns:
Dictionary of reward components and total reward with the following keys:
’total’: The mono-objective reward for the agent. Computed as the average of all reward components.
’ecology’: The ecological reward component.
’wellbeing’: The wellbeing reward component.
’biodiversity’: The biodiversity reward component.
- Return type:
- _was_dead_step(action=None)[source]¶
Handle a step for an agent that is already terminated or truncated.
This method is called when an agent attempts to take an action after it has already reached a terminal state or the episode has been truncated. It assigns zero reward and selects the next agent.
- Parameters:
action (int, optional) – The action that was attempted.
- action_space(agent_id)[source]¶
Return the action space for a specific agent.
This method returns a Discrete space representing all possible actions the agent can take in the environment.
- Parameters:
agent_id (str) – The ID of the agent to get the action space for.
- Returns:
The action space for the specified agent.
- Return type:
- agent_iter(max_iter: int = 9223372036854775808) AECIterable¶
Yields the current agent (self.agent_selection).
Needs to be used in a loop where you step() each iteration.
- close()[source]¶
Close the environment and clean up resources.
This method finalizes all renderers and closes the metrics_collector.
- last()[source]¶
Return the most recent environment step information.
This method returns all relevant information about the most recent step taken by the current agent.
- Returns:
- A tuple containing:
observation (dict): The current observation. The dictionary contains:
observation (
numpy.ndarray): The agent’s view of the environment.action_mask (
numpy.ndarray): Binary mask indicating valid actions.
reward (float): The most recent reward.
termination (bool): Whether the agent is in a terminal state.
truncation (bool): Whether the episode was truncated.
info (dict): Additional information about the agent. Refer to
_get_info()for details on the returned value.
- Return type:
- observation_space(agent_id)[source]¶
Return the observation space for a specific agent.
This method delegates to the observation strategy to return the appropriate observation space based on the configured observation type.
- Parameters:
agent_id (str) – The ID of the agent to get the observation space for.
- Returns:
The observation space for the specified agent.
- Return type:
- render()[source]¶
Render the current state of the environment.
This method uses all configured renderers to visualize the current state of the grid world and agents.
- reset(seed=None, options=None)[source]¶
Reset the environment to its initial state.
This method resets the agent selector, metrics collector, move counter, and initializes the observations, rewards, terminations, truncations, and info dictionaries for all agents.
- Parameters:
- Returns:
- A tuple containing:
observations (dict): Initial observations for all agents.
infos (dict): Additional information for all agents.
- Return type:
- state() ndarray¶
State returns a global view of the environment.
It is appropriate for centralized training decentralized execution methods like QMIX
- step(action: int)[source]¶
Execute a step in the environment for the current agent.
This method processes the action for the current agent, updates the environment state, calculates rewards, generates new observations, updates metrics, and selects the next agent to act.
If all agents have taken an action in the current turn, it updates the environmental conditions (pollution, flower growth).