ethicalgardeners.algorithms.SB3Wrapper¶
- class ethicalgardeners.algorithms.SB3Wrapper(env: AECEnv[AgentID, ObsType, ActionType])[source]¶
Bases:
BaseWrapper,EnvWrapper to adapt a PettingZoo AEC environment to be compatible with Stable Baselines3. - Only returns the observation (without action mask) for the current agent. - the observation_space and action_space are aligned with the current
agent.
- __init__(env: AECEnv[AgentID, ObsType, ActionType])¶
Methods
__init__(env)Return the action mask for the current agent.
action_space(agent)Takes in agent and returns the action space for that agent.
agent_iter([max_iter])Yields the current agent (self.agent_selection).
close()Closes any resources that should be released.
get_wrapper_attr(name)Gets the attribute name from the environment.
has_wrapper_attr(name)Checks if the attribute name exists in the environment.
last([observe])Returns observation, cumulative reward, terminated, truncated, info for the current agent (specified by self.agent_selection).
observation_space(agent)Takes in agent and returns the observation space for that agent.
observe(agent)Return the observation without the action mask for the current agent.
render()Renders the environment as specified by self.render_mode.
reset([seed, options])Align the observation_space and action_space with the current agent and return the initial observation and info as per Gymnasium API.
set_wrapper_attr(name, value, *[, force])Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.
state()State returns a global view of the environment.
step(action)Accepts and executes the action of the current agent_selection in the environment.
Attributes
max_num_agentsmetadataReturns the environment's internal
_np_randomthat if not set will initialise with a random seed.Returns the environment's internal
_np_random_seedthat if not set will first initialise with a random int as seed.num_agentsrender_modespecReturns the base non-wrapped environment.
possible_agentsagentsobservation_spacesaction_spacesterminationstruncationsrewardsinfosagent_selection- _accumulate_rewards() None¶
Adds .rewards dictionary to ._cumulative_rewards dictionary.
Typically called near the end of a step() method
- _deads_step_first() AgentID¶
Makes .agent_selection point to first terminated agent.
Stores old value of agent_selection so that _was_dead_step can restore the variable after the dead agent steps.
- _was_dead_step(action: ActionType) None¶
Helper function that performs step() for dead agents.
Does the following:
Removes dead agent from .agents, .terminations, .truncations, .rewards, ._cumulative_rewards, and .infos
Loads next agent into .agent_selection: if another agent is dead, loads that one, otherwise load next live agent
Clear the rewards dict
Examples
Highly recommended to use at the beginning of step as follows:
- def step(self, action):
- if (self.terminations[self.agent_selection] or self.truncations[self.agent_selection]):
self._was_dead_step() return
# main contents of step
- action_space(agent: AgentID) Space¶
Takes in agent and returns the action space for that agent.
MUST return the same value for the same agent name
Default implementation is to return the action_spaces dict
- agent_iter(max_iter: int = 9223372036854775808) AECIterable¶
Yields the current agent (self.agent_selection).
Needs to be used in a loop where you step() each iteration.
- close() None¶
Closes any resources that should be released.
Closes the rendering window, subprocesses, network connections, or any other resources that should be released.
- last(observe: bool = True) tuple[ObsType | None, float, bool, bool, dict[str, Any]]¶
Returns observation, cumulative reward, terminated, truncated, info for the current agent (specified by self.agent_selection).
- property np_random: Generator¶
Returns the environment’s internal
_np_randomthat if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- property np_random_seed: int¶
Returns the environment’s internal
_np_random_seedthat if not set will first initialise with a random int as seed.If
np_random_seedwas set directly instead of throughreset()orset_np_random_through_seed(), the seed will take the value -1.- Returns:
the seed of the current np_random or -1, if the seed of the rng is unknown
- Return type:
- observation_space(agent: AgentID) Space¶
Takes in agent and returns the observation space for that agent.
MUST return the same value for the same agent name
Default implementation is to return the observation_spaces dict
- render() None | ndarray | str | list¶
Renders the environment as specified by self.render_mode.
Render mode can be human to display a window. Other render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).
- reset(seed=None, options=None)[source]¶
Align the observation_space and action_space with the current agent and return the initial observation and info as per Gymnasium API.
- set_wrapper_attr(name: str, value: Any, *, force: bool = True) bool¶
Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.
- state() ndarray¶
State returns a global view of the environment.
It is appropriate for centralized training decentralized execution methods like QMIX
- step(action)[source]¶
Accepts and executes the action of the current agent_selection in the environment.
Automatically switches control to the next agent.
- property unwrapped: AECEnv¶
Returns the base non-wrapped environment.
- Returns:
The base non-wrapped
gymnasium.Envinstance- Return type:
Env