Custom scenario

This Smart Grid simulator was designed to support various experiments, including in terms of agents (number, profiles), physical constraints in the world (available energy), and reward functions. We call the combination of these elements a scenario, and we describe here how to fully customize a scenario.

For each of these elements, we give a succinct description; for a better, more complete understanding of how they work, please refer to their API documentation.

Agents’ profiles

Agents have some common characteristics, such as their personal battery capacity, how much energy they need each step, how much energy they produce, how they determine their comfort based on their need and consumption. To simplify the creation of agents and reduce resources (memory and computations), these characteristics are grouped and shared in Profiles.

An AgentProfile can be loaded from data files (see, e.g., the data/openei folder); to do so, it is necessary to use a DataConversion object.

For example:

from smartgrid.agents import DataOpenEIConversion
from smartgrid.agents import comfort

# Create a converter specialized for the `data/openei` files.
converter = DataOpenEIConversion()

# Load agents' profiles, using the data files.
converter.load(
    name='Household',  # Profile name -- a unique ID
    data_path='./data/openei/profile_residential_annually.npz',  # Data file
    comfort_fn=comfort.flexible_comfort_profile  # Comfort function
)
converter.load(
    'Office',
    './data/openei/profile_office_annually.npz',
    comfort.neutral_comfort_profile
)
converter.load(
    'School',
    './data/openei/profile_school_annually.npz',
    comfort.strict_comfort_profile
)

# Profiles can be accessed through the `profiles` attribute, and are indexed
# by their ID.
profile_household = converter.profiles['Household']
profile_office = converter.profiles['Office']
profile_school = converter.profiles['School']

You can use the converter object to load any profile you desire, and use these profiles to instantiate Agents.

Note

If the package was installed through pip instead of cloning the repository, accessing the files through a relative path will not work. Instead, the files must be accessed from the installed package itself.

To simplify getting the path to data files, the find_profile_data() function may be used, although it has some limitations. In particular, it only works with a single level of nesting (e.g., data/dataset/sub-dataset/file will not work). Yet, this function will work whether you have cloned the repository (as long as the current working directory is at the project root), or installed as a package; it is the recommended way to specify which data file to use.

from smartgrid.make_env import find_profile_data

converter = DataOpenEIConversion()

converter.load(
    'Office',
    find_profile_data('openei', 'profile_office_annually.npz'),
    comfort.neutral_comfort_profile
)

Energy generator

The EnergyGenerator is used to determine, at each time step, the amount of available energy in the world. Several implementations are available, e.g., the RandomEnergyGenerator, the ScarceEnergyGenerator, or the GenerousEnergyGenerator. They “generate” a random amount of energy based on the total need of all agents at the current time step. Another implementation is the RealisticEnergyGenerator, which uses a dataset of productions per time step to determine the amount.

For example, using a random generator:

from smartgrid.util import RandomEnergyGenerator

# This generator will generate between 75% and 110% of the agents' total need
# at each step.
generator = RandomEnergyGenerator(
    lower_proportion=0.75,
    upper_proportion=1.10
)

# Example with current_need = 10_000 Wh.
amount = generator.generate_available_energy(
    current_need=10_000,
    # The other values are not important for this generator.
    current_step=0,
    min_need=0,
    max_need=100_000
)
assert 0.75 * 10_000 <= amount < 1.10 * 10_000

Another example, using the realistic generator:

from smartgrid.util import RealisticEnergyGenerator

# The dataset (source of truth) for energy production at each time step.
# This dataset means that, at t=0, 80% of the agents' maximum need will be
# available; at t=1, 66% of their maximum need; and at t=2, 45%.
# Subsequent time steps will simply cycle over this array, e.g., t=3 is
# the same as t=0.
data = [0.80, 0.66, 0.45]
generator = RealisticEnergyGenerator(data=data)

# Example with max_need = 100_000 Wh.
amount = generator.generate_available_energy(
    max_need=100_000,
    current_step=0,
    # The other values are not important for this generator.
    current_need=10_000,
    min_need=0
)
assert amount == int(100_000 * data[0])

World

The World represents a simulated “physical” world. It handles the physical aspects: agents, available energy, and updates through agents’ actions.

The world is instantiated from a list of agents, and an energy generator:

from smartgrid import World
from smartgrid.agents import Agent

# We assume that the variables instantiated above are available,
# especially the `converter` (with loaded profiles) and the `generator`.

# Create the agents, based on loaded profiles.
agents = []
for i in range(5):
    agents.append(
        Agent(
            name=f'Household{i+1}',  # Unique name -- recommended to use profile + index
            profile=converter.profiles['Household']  # Agent Profile
        )
    )
for i in range(3):
    agents.append(
        Agent(f'Office{i+1}', profile_office)
    )

# Create the world, with agents and energy generator.
world = World(
    agents=agents,
    energy_generator=generator
)

At this point, we have a usable world, able to simulate a smart grid, and to update itself when agents take actions. (It is even usable as-is, if you are not interested in Reinforcement Learning!) However, to benefit from the RL interaction loop (observations, actions, rewards), we have to create an Environment.

Reward functions

Reward functions dictate what is the agents’ expected behaviour. Several have been implemented and are directly available; they target different ethical considerations, such as equity, maximizing comfort, etc. Please refer to the rewards module for a detailed list.

A particularly interesting reward function is AdaptabilityThree: its definition evolves as the time steps increase, which forces agents to adapt to changing ethical considerations and objectives.

To use it, simply import it and create an instance:

from smartgrid.rewards.numeric.differentiated import AdaptabilityThree

rewards = [AdaptabilityThree()]

Note

The environment has (partial) support for Multi-Objective RL (MORL), hence the use of a list of rewards. When using “traditional” (single-objective) RL algorithms, make sure to specify only 1 reward function, and to use a wrapper that aggregates several rewards into a single scalar number.

SmartGrid Env

Finally, the SmartGrid class represents the link with Gymnasium’s standard, by extending the Env class. It is responsible for providing observations at each time step, receiving actions, and computing the rewards based on observations and actions.

from smartgrid import SmartGrid

env = SmartGrid(
    world=world,
    rewards=rewards
)

Maximum number of steps

By default, the environment does not terminate: it is not episodic. The simulation will run as long as the interaction loop continues. It is possible to set a maximum number of steps, so that the environment will signal, through its truncated return value, that it should stop. This can be especially useful when using specialized learning libraries that are built to automatically check the terminated and truncated return values.

To do so, simply set the parameter when creating the instance:

env = SmartGrid(
    world=world,
    rewards=rewards,
    max_step=10_000
)

After max_step steps have been done, the environment can still be used, but it will emit a warning.

Single- or multi-objective

If only 1 reward function is used, and single-objective learning algorithms are targeted, the env may be wrapped in a specific class that returns a single (scalar) reward instead of a dict:

from smartgrid.wrappers import SingleRewardAggregator

env = SingleRewardAggregator(env)

This simplifies the usage of the environment for most cases. When dealing with multiple reward functions, other aggregators such as the WeightedSumRewardAggregator, or the MinRewardAggregator can be used instead. To use multi-objective learning algorithms, which receive several rewards each step, simply avoid wrapping the base environment.

When the environment is wrapped, the base environment can be obtained through the unwrapped property. The wrapper allows access to any public attribute of the environment automatically:

smartgrid = env.unwrapped
num_agents = env.num_agents  # Note that `num_agents` is not defined in the wrapper!
assert num_agents == smartgrid.num_agents

The interaction loop

The Env is now ready for the interaction loop!

If a maximum number of step has been specified, the traditional done loop can be used:

done = False
obs_n, _ = env.reset()
while not done:
    # Implement your decision algorithm here
    actions = {
        agent_name: env.action_space(agent_name).sample()
        for agent_name in env.agents
    }
    obs_n, rewards_n, terminated_n, truncated_n, info_n = env.step(actions)
    done = all(terminated_n) or all(truncated_n)
env.close()

Otherwise, the env termination must be handled by the interaction loop itself:

max_step = 50
obs_n, _ = env.reset()
for _ in range(max_step):
    # Implement your decision algorithm here
   actions = {
        agent_name: env.action_space(agent_name).sample()
        for agent_name in env.agents
    }
    # Note that we do not need the `terminated` nor `truncated` values here.
    obs_n, rewards_n, _, _, info_n = env.step(actions)
env.close()

Both ways are completely equivalent: use one or the other at your convenience.