ethicalgardeners.algorithms.SB3Wrapper¶

class ethicalgardeners.algorithms.SB3Wrapper(env: AECEnv[AgentID, ObsType, ActionType])[source]¶

Bases: BaseWrapper, Env

Wrapper to adapt a PettingZoo AEC environment to be compatible with Stable Baselines3. - Only returns the observation (without action mask) for the current agent. - the observation_space and action_space are aligned with the current

agent.

__init__(env: AECEnv[AgentID, ObsType, ActionType])¶

Methods

`__init__`(env)
`action_mask`()	Return the action mask for the current agent.
`action_space`(agent)	Takes in agent and returns the action space for that agent.
`agent_iter`([max_iter])	Yields the current agent (self.agent_selection).
`close`()	Closes any resources that should be released.
`get_wrapper_attr`(name)	Gets the attribute name from the environment.
`has_wrapper_attr`(name)	Checks if the attribute name exists in the environment.
`last`([observe])	Returns observation, cumulative reward, terminated, truncated, info for the current agent (specified by self.agent_selection).
`observation_space`(agent)	Takes in agent and returns the observation space for that agent.
`observe`(agent)	Return the observation without the action mask for the current agent.
`render`()	Renders the environment as specified by self.render_mode.
`reset`([seed, options])	Align the observation_space and action_space with the current agent and return the initial observation and info as per Gymnasium API.
`set_wrapper_attr`(name, value, *[, force])	Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.
`state`()	State returns a global view of the environment.
`step`(action)	Accepts and executes the action of the current agent_selection in the environment.

Attributes

`max_num_agents`
`metadata`
`np_random`	Returns the environment's internal `_np_random` that if not set will initialise with a random seed.
`np_random_seed`	Returns the environment's internal `_np_random_seed` that if not set will first initialise with a random int as seed.
`num_agents`
`render_mode`
`spec`
`unwrapped`	Returns the base non-wrapped environment.
`possible_agents`
`agents`
`observation_spaces`
`action_spaces`
`terminations`
`truncations`
`rewards`
`infos`
`agent_selection`

_accumulate_rewards() → None¶

Adds .rewards dictionary to ._cumulative_rewards dictionary.

Typically called near the end of a step() method

_clear_rewards() → None¶: Clears all items in .rewards.

_deads_step_first() → AgentID¶

Makes .agent_selection point to first terminated agent.

Stores old value of agent_selection so that _was_dead_step can restore the variable after the dead agent steps.

_was_dead_step(action: ActionType) → None¶

Helper function that performs step() for dead agents.

Does the following:

Removes dead agent from .agents, .terminations, .truncations, .rewards, ._cumulative_rewards, and .infos
Loads next agent into .agent_selection: if another agent is dead, loads that one, otherwise load next live agent
Clear the rewards dict

Examples

Highly recommended to use at the beginning of step as follows:

def step(self, action):

if (self.terminations[self.agent_selection] or self.truncations[self.agent_selection]):: self._was_dead_step() return

# main contents of step

action_mask()[source]¶: Return the action mask for the current agent.

action_space(agent: AgentID) → Space¶

Takes in agent and returns the action space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the action_spaces dict

agent_iter(max_iter: int = 9223372036854775808) → AECIterable¶

Yields the current agent (self.agent_selection).

Needs to be used in a loop where you step() each iteration.

close() → None¶

Closes any resources that should be released.

Closes the rendering window, subprocesses, network connections, or any other resources that should be released.

get_wrapper_attr(name: str) → Any¶: Gets the attribute name from the environment.

has_wrapper_attr(name: str) → bool¶: Checks if the attribute name exists in the environment.

last(observe: bool = True) → tuple[ObsType | None, float, bool, bool, dict[str, Any]]¶: Returns observation, cumulative reward, terminated, truncated, info for the current agent (specified by self.agent_selection).

property np_random: Generator¶

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:: Instances of np.random.Generator

property np_random_seed: int¶

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:: the seed of the current np_random or -1, if the seed of the rng is unknown
Return type:: int

observation_space(agent: AgentID) → Space¶

Takes in agent and returns the observation space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the observation_spaces dict

observe(agent)[source]¶: Return the observation without the action mask for the current agent.

render() → None | ndarray | str | list¶

Renders the environment as specified by self.render_mode.

Render mode can be human to display a window. Other render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).

reset(seed=None, options=None)[source]¶: Align the observation_space and action_space with the current agent and return the initial observation and info as per Gymnasium API.

set_wrapper_attr(name: str, value: Any, *, force: bool = True) → bool¶: Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

state() → ndarray¶

State returns a global view of the environment.

It is appropriate for centralized training decentralized execution methods like QMIX

step(action)[source]¶

Accepts and executes the action of the current agent_selection in the environment.

Automatically switches control to the next agent.

property unwrapped: AECEnv¶

Returns the base non-wrapped environment.

Returns:: The base non-wrapped gymnasium.Env instance
Return type:: Env