ethicalgardeners.algorithms.SB3Wrapper

class ethicalgardeners.algorithms.SB3Wrapper(env: AECEnv[AgentID, ObsType, ActionType])[source]

Bases: BaseWrapper, Env

Wrapper to adapt a PettingZoo AEC environment to be compatible with Stable Baselines3. - Only returns the observation (without action mask) for the current agent. - the observation_space and action_space are aligned with the current

agent.

__init__(env: AECEnv[AgentID, ObsType, ActionType])

Methods

__init__(env)

action_mask()

Return the action mask for the current agent.

action_space(agent)

Takes in agent and returns the action space for that agent.

agent_iter([max_iter])

Yields the current agent (self.agent_selection).

close()

Closes any resources that should be released.

get_wrapper_attr(name)

Gets the attribute name from the environment.

has_wrapper_attr(name)

Checks if the attribute name exists in the environment.

last([observe])

Returns observation, cumulative reward, terminated, truncated, info for the current agent (specified by self.agent_selection).

observation_space(agent)

Takes in agent and returns the observation space for that agent.

observe(agent)

Return the observation without the action mask for the current agent.

render()

Renders the environment as specified by self.render_mode.

reset([seed, options])

Align the observation_space and action_space with the current agent and return the initial observation and info as per Gymnasium API.

set_wrapper_attr(name, value, *[, force])

Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

state()

State returns a global view of the environment.

step(action)

Accepts and executes the action of the current agent_selection in the environment.

Attributes

max_num_agents

metadata

np_random

Returns the environment's internal _np_random that if not set will initialise with a random seed.

np_random_seed

Returns the environment's internal _np_random_seed that if not set will first initialise with a random int as seed.

num_agents

render_mode

spec

unwrapped

Returns the base non-wrapped environment.

possible_agents

agents

observation_spaces

action_spaces

terminations

truncations

rewards

infos

agent_selection

_accumulate_rewards() None

Adds .rewards dictionary to ._cumulative_rewards dictionary.

Typically called near the end of a step() method

_clear_rewards() None

Clears all items in .rewards.

_deads_step_first() AgentID

Makes .agent_selection point to first terminated agent.

Stores old value of agent_selection so that _was_dead_step can restore the variable after the dead agent steps.

_was_dead_step(action: ActionType) None

Helper function that performs step() for dead agents.

Does the following:

  1. Removes dead agent from .agents, .terminations, .truncations, .rewards, ._cumulative_rewards, and .infos

  2. Loads next agent into .agent_selection: if another agent is dead, loads that one, otherwise load next live agent

  3. Clear the rewards dict

Examples

Highly recommended to use at the beginning of step as follows:

def step(self, action):
if (self.terminations[self.agent_selection] or self.truncations[self.agent_selection]):

self._was_dead_step() return

# main contents of step

action_mask()[source]

Return the action mask for the current agent.

action_space(agent: AgentID) Space

Takes in agent and returns the action space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the action_spaces dict

agent_iter(max_iter: int = 9223372036854775808) AECIterable

Yields the current agent (self.agent_selection).

Needs to be used in a loop where you step() each iteration.

close() None

Closes any resources that should be released.

Closes the rendering window, subprocesses, network connections, or any other resources that should be released.

get_wrapper_attr(name: str) Any

Gets the attribute name from the environment.

has_wrapper_attr(name: str) bool

Checks if the attribute name exists in the environment.

last(observe: bool = True) tuple[ObsType | None, float, bool, bool, dict[str, Any]]

Returns observation, cumulative reward, terminated, truncated, info for the current agent (specified by self.agent_selection).

property np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

property np_random_seed: int

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:

the seed of the current np_random or -1, if the seed of the rng is unknown

Return type:

int

observation_space(agent: AgentID) Space

Takes in agent and returns the observation space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the observation_spaces dict

observe(agent)[source]

Return the observation without the action mask for the current agent.

render() None | ndarray | str | list

Renders the environment as specified by self.render_mode.

Render mode can be human to display a window. Other render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).

reset(seed=None, options=None)[source]

Align the observation_space and action_space with the current agent and return the initial observation and info as per Gymnasium API.

set_wrapper_attr(name: str, value: Any, *, force: bool = True) bool

Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

state() ndarray

State returns a global view of the environment.

It is appropriate for centralized training decentralized execution methods like QMIX

step(action)[source]

Accepts and executes the action of the current agent_selection in the environment.

Automatically switches control to the next agent.

property unwrapped: AECEnv

Returns the base non-wrapped environment.

Returns:

The base non-wrapped gymnasium.Env instance

Return type:

Env