Custom scenario =============== This Smart Grid simulator was designed to support various experiments, including in terms of agents (number, profiles), physical constraints in the world (available energy), and reward functions. We call the combination of these elements a *scenario*, and we describe here how to fully customize a scenario. For each of these elements, we give a succinct description; for a better, more complete understanding of how they work, please refer to their API documentation. Agents' profiles ---------------- Agents have some common characteristics, such as their personal battery capacity, how much energy they need each step, how much energy they produce, how they determine their comfort based on their need and consumption. To simplify the creation of agents and reduce resources (memory and computations), these characteristics are grouped and shared in *Profiles*. An :py:class:`~smartgrid.agents.profile.agent_profile.AgentProfile` can be loaded from data files (see, e.g., the ``data/openei`` folder); to do so, it is necessary to use a :py:class:`~smartgrid.agents.data_conversion.DataConversion` object. For example: .. code-block:: Python from smartgrid.agents import DataOpenEIConversion from smartgrid.agents import comfort # Create a converter specialized for the `data/openei` files. converter = DataOpenEIConversion() # Load agents' profiles, using the data files. converter.load( name='Household', # Profile name -- a unique ID data_path='./data/openei/profile_residential_annually.npz', # Data file comfort_fn=comfort.flexible_comfort_profile # Comfort function ) converter.load( 'Office', './data/openei/profile_office_annually.npz', comfort.neutral_comfort_profile ) converter.load( 'School', './data/openei/profile_school_annually.npz', comfort.strict_comfort_profile ) # Profiles can be accessed through the `profiles` attribute, and are indexed # by their ID. profile_household = converter.profiles['Household'] profile_office = converter.profiles['Office'] profile_school = converter.profiles['School'] You can use the converter object to load any profile you desire, and use these profiles to instantiate :py:class:`~smartgrid.agents.agent.Agent`\ s. .. note:: If the package was installed through ``pip`` instead of cloning the repository, accessing the files through a relative path will not work. Instead, the files must be accessed from the installed package itself. To simplify getting the path to data files, the :py:func:`~smartgrid.make_env.find_profile_data` function may be used, although it has some limitations. In particular, it only works with a single level of nesting (e.g., ``data/dataset/sub-dataset/file`` will not work). Yet, this function will work whether you have cloned the repository (as long as the current working directory is at the project root), or installed as a package; it is the recommended way to specify which data file to use. .. code-block:: Python from smartgrid.make_env import find_profile_data converter = DataOpenEIConversion() converter.load( 'Office', find_profile_data('openei', 'profile_office_annually.npz'), comfort.neutral_comfort_profile ) Energy generator ---------------- The :py:class:`~smartgrid.util.available_energy.EnergyGenerator` is used to determine, at each time step, the amount of available energy in the world. Several implementations are available, e.g., the :py:class:`~smartgrid.util.available_energy.RandomEnergyGenerator`, the :py:class:`~smartgrid.util.available_energy.ScarceEnergyGenerator`, or the :py:class:`~smartgrid.util.available_energy.GenerousEnergyGenerator`. They "generate" a random amount of energy based on the total need of all agents at the current time step. Another implementation is the :py:class:`~smartgrid.util.available_energy.RealisticEnergyGenerator`, which uses a dataset of productions per time step to determine the amount. For example, using a random generator: .. code-block:: Python from smartgrid.util import RandomEnergyGenerator # This generator will generate between 75% and 110% of the agents' total need # at each step. generator = RandomEnergyGenerator( lower_proportion=0.75, upper_proportion=1.10 ) # Example with current_need = 10_000 Wh. amount = generator.generate_available_energy( current_need=10_000, # The other values are not important for this generator. current_step=0, min_need=0, max_need=100_000 ) assert 0.75 * 10_000 <= amount < 1.10 * 10_000 Another example, using the realistic generator: .. code-block:: Python from smartgrid.util import RealisticEnergyGenerator # The dataset (source of truth) for energy production at each time step. # This dataset means that, at t=0, 80% of the agents' maximum need will be # available; at t=1, 66% of their maximum need; and at t=2, 45%. # Subsequent time steps will simply cycle over this array, e.g., t=3 is # the same as t=0. data = [0.80, 0.66, 0.45] generator = RealisticEnergyGenerator(data=data) # Example with max_need = 100_000 Wh. amount = generator.generate_available_energy( max_need=100_000, current_step=0, # The other values are not important for this generator. current_need=10_000, min_need=0 ) assert amount == int(100_000 * data[0]) World ----- The :py:class:`~smartgrid.world.World` represents a simulated "physical" world. It handles the physical aspects: agents, available energy, and updates through agents' actions. The world is instantiated from a list of agents, and an energy generator: .. code-block:: Python from smartgrid import World from smartgrid.agents import Agent # We assume that the variables instantiated above are available, # especially the `converter` (with loaded profiles) and the `generator`. # Create the agents, based on loaded profiles. agents = [] for i in range(5): agents.append( Agent( name=f'Household{i+1}', # Unique name -- recommended to use profile + index profile=converter.profiles['Household'] # Agent Profile ) ) for i in range(3): agents.append( Agent(f'Office{i+1}', profile_office) ) # Create the world, with agents and energy generator. world = World( agents=agents, energy_generator=generator ) At this point, we have a usable world, able to simulate a smart grid, and to update itself when agents take actions. (It is even usable as-is, if you are not interested in Reinforcement Learning!) However, to benefit from the RL *interaction loop* (observations, actions, rewards), we have to create an Environment. Reward functions ---------------- Reward functions dictate what is the agents' expected behaviour. Several have been implemented and are directly available; they target different ethical considerations, such as equity, maximizing comfort, etc. Please refer to the :py:mod:`rewards ` module for a detailed list. A particularly interesting reward function is :py:class:`~smartgrid.rewards.numeric.differentiated.adaptability.AdaptabilityThree`: its definition evolves as the time steps increase, which forces agents to adapt to changing ethical considerations and objectives. To use it, simply import it and create an instance: .. code-block:: Python from smartgrid.rewards.numeric.differentiated import AdaptabilityThree rewards = [AdaptabilityThree()] .. note:: The environment has (partial) support for *Multi-Objective* RL (MORL), hence the use of a list of rewards. When using "traditional" (*single-objective*) RL algorithms, make sure to specify only 1 reward function, and to use a wrapper that aggregates several rewards into a single scalar number. SmartGrid Env ------------- Finally, the :py:class:`~smartgrid.environment.SmartGrid` class represents the link with Gymnasium's standard, by extending the :py:class:`~gymnasium.core.Env` class. It is responsible for providing observations at each time step, receiving actions, and computing the rewards based on observations and actions. .. code-block:: Python from smartgrid import SmartGrid env = SmartGrid( world=world, rewards=rewards ) Maximum number of steps ^^^^^^^^^^^^^^^^^^^^^^^ By default, the environment does not terminate: it is not episodic. The simulation will run as long as the *interaction loop* continues. It is possible to set a maximum number of steps, so that the environment will signal, through its ``truncated`` return value, that it should stop. This can be especially useful when using specialized learning libraries that are built to automatically check the ``terminated`` and ``truncated`` return values. To do so, simply set the parameter when creating the instance: .. code-block:: Python env = SmartGrid( world=world, rewards=rewards, max_step=10_000 ) After ``max_step`` steps have been done, the environment can still be used, but it will emit a warning. Single- or multi-objective ^^^^^^^^^^^^^^^^^^^^^^^^^^ If only 1 reward function is used, and *single-objective* learning algorithms are targeted, the env may be wrapped in a specific class that returns a single (scalar) reward instead of a dict: .. code-block:: Python from smartgrid.wrappers import SingleRewardAggregator env = SingleRewardAggregator(env) This simplifies the usage of the environment for most cases. When dealing with multiple reward functions, other aggregators such as the :py:class:`~smartgrid.wrappers.reward_aggregator.WeightedSumRewardAggregator`, or the :py:class:`~smartgrid.wrappers.reward_aggregator.MinRewardAggregator` can be used instead. To use *multi-objective* learning algorithms, which receive several rewards each step, simply avoid wrapping the base environment. When the environment is wrapped, the base environment can be obtained through the :py:obj:`~gymnasium.Wrapper.unwrapped` property. The wrapper allows access to any public attribute of the environment automatically: .. code-block:: Python smartgrid = env.unwrapped num_agents = env.num_agents # Note that `num_agents` is not defined in the wrapper! assert num_agents == smartgrid.num_agents The interaction loop ^^^^^^^^^^^^^^^^^^^^ The Env is now ready for the *interaction loop*! If a maximum number of step has been specified, the traditional ``done`` loop can be used: .. code-block:: Python done = False obs_n, _ = env.reset() while not done: # Implement your decision algorithm here actions = { agent_name: env.action_space(agent_name).sample() for agent_name in env.agents } obs_n, rewards_n, terminated_n, truncated_n, info_n = env.step(actions) done = all(terminated_n) or all(truncated_n) env.close() Otherwise, the env termination must be handled by the interaction loop itself: .. code-block:: Python max_step = 50 obs_n, _ = env.reset() for _ in range(max_step): # Implement your decision algorithm here actions = { agent_name: env.action_space(agent_name).sample() for agent_name in env.agents } # Note that we do not need the `terminated` nor `truncated` values here. obs_n, rewards_n, _, _, info_n = env.step(actions) env.close() Both ways are completely equivalent: use one or the other at your convenience.