Argumentation-based reward functions¶
By default, EthicalSmartGrid uses numeric-based reward functions, such as
AdaptabilityThree
.
From v1.2.0 onwards, you may also use argumentation-based reward functions, which use an argumentation structure rather than a pure mathematical function. Argumentation uses arguments and attacks to define how to “judge” the behaviour of the learning agents.
You can use argumentation:
with the existing reward functions specified in EthicalSmartGrid, which correspond to the 4 moral values that were defined in The SmartGrid use-case: Affordability, Environmental Sustainability, Inclusiveness, and Supply Security;
to create your own reward functions, by directly using the AJAR library to define argumentation graphs that correspond to your desired moral values.
Using the existing argumentation reward functions¶
You can import these reward functions from the smartgrid.rewards.argumentation
package; accessing this packages requires the AJAR library, which you can
install with pip install git+https://github.com/ethicsai/ajar.git@v1.0.0
.
Trying to import anything from this package without having AJAR will raise
an error.
The 4 reward functions can be imported as such:
from smartgrid.rewards.argumentation import (
Affordability,
EnvironmentalSustainability,
Inclusiveness,
SupplySecurity
)
Then, you can create a new instance of the SmartGrid environment, exactly as when creating a Custom scenario:
# 1. Load agents' profiles
converter = DataOpenEIConversion()
converter.load('Household',
find_profile_data('openei', 'profile_residential_annually.npz'),
comfort.flexible_comfort_profile)
# 2. Create agents
agents = []
for i in range(20):
agents.append(
Agent(f'Household{i+1}', converter.profiles['Household'])
)
# 3. Create world
generator = RandomEnergyGenerator()
world = World(agents, generator)
# 4. Choose reward functions
rewards = [
Affordability(),
EnvironmentalSustainability(),
Inclusiveness(),
SupplySecurity(),
]
# 5. Create env
simulator = SmartGrid(
world,
rewards,
max_step,
ObservationManager()
)
# 6. (Optional) Wrap the env to return scalar rewards (average)
simulator = WeightedSumRewardAggregator(simulator)
Step 4 is the most important here: this is where you define the argumentation-based reward functions. We have specified all 4 in this example, but you may select only some of them, or a single one, as you desire: they work independently.
Because we have specified here 4 different moral values, you may also use a
wrapper (WeightedSumRewardAggregator
)
that returns the average of the various rewards as a scalar reward (for
single-objective reinforcement learning). If you want to use a multi-objective
reinforcement learning algorithm, you can skip step 6.
The environment will work exactly as when using numeric-based reward functions; use the standard interaction loop to make your agents receive observations and make decisions based on them.
Writing custom argumentation reward functions¶
You can also use the AJAR library to create your own argumentation-based reward functions. This requires 3 steps:
Creating the argumentation graph (
AFDM
), with arguments attacks.Creating the
JudgingAgent
, which will perform the actual judgment, i.e., transforming the symbolic arguments into a scalar reward.Creating the
Reward
, which will wrap the judging agent into something usable by EthicalSmartGrid.
The most important step here is the 1st one, which will truly define how the reward function works, which behaviours it will encourage, etc.
Creating the argumentation graph¶
The argumentation graph is created by instantiating an AFDM
and adding Argument
s to it:
from ajar import AFDM, Argument
afdm = AFDM()
decision = 'moral'
afdm.add_argument(Argument(
"The argument identifier here",
"The (longer) argument description here",
lambda s: s['some_variable'] > 3, # The activation function
supports=[decision]
))
The first parameter should be a short identifier that represents your argument; the second one (optional) can be a longer text to help describe the argument.
The third one (optional) is the activation function, which determines when the
argument should be considered active. In the Smart Grid use-case, we can for
example have an argument “The agent has a comfort greater than 90%”, for which
the activation function will be s['comfort'] > 0.9
. The object s
here
represents the situation to be judged. By default, in EthicalSmartGrid, we provide
the parse_situation()
helper
function that will return a somewhat symbolic representation of the current
environment state and the learning agent’s action.
Finally, you may set whether the argument supports or counters the moral
decision. If the argument supports it (supports=[decision]
), it means that
the argument argues the learning agent performed well with respect to this moral
value; if it counters it (counters=[decision]
), it means the argument argues
the learning performed badly with respect to this moral value. You may also
specify neither of them, which means the argument is neutral.
After creating several arguments, you can also add attacks by specifying either the argument name or a reference to the argument itself:
afdm.add_argument(Argument(
"other_argument"
))
afdm.add_attack_relationship("The argument identifier here", "other_argument")
The attack here means the "The argument identifier here"
(our first argument)
attacks the "other_argument"
. If the first argument is alive in a given
situation, the attacked argument must be defended by another to stay alive.
You may create as many arguments and attacks as you want. You can use Argumentation Reward Designer for a visual interface that produces Python code compatible with AJAR.
Creating the Judging agent¶
The next step is to create a JudgingAgent
that
will perform the judgment. An AFDM
simply holds the
argumentation graph, and can determine arguments that are acceptable in a
given situation. However, the judgment itself, which returns a scalar reward
from a set of acceptable arguments, is done by Judging agents. In particular,
they are responsible for choosing how to compute this reward; this will often
boil down to comparing the number of acceptable “supporting” arguments, and
acceptable “countering” arguments. The judgment
module offers
several such methods.
from ajar import JudgingAgent, judgment
judge = JudgingAgent("Your moral value name here", afdm, judgment.j_simple)
The first argument is the name of the moral value you want this agent to
represent, for example "equity"
. The second argument is the
AFDM
we defined previously. Finally, the third argument
is the judgment function mentioned just above.
This agent can already be used to judge a situation, by using its
judge()
method, such as:
judge.judge(situation={}, decision=decision)
. However, to better work with
EthicalSmartGrid, we must now wrap it in a Reward
.
Creating a Reward¶
To bridge the judging agents with EthicalSmartGrid, create a class that derives
from :py:class`~smartgrid.rewards.reward.Reward`, and which overrides its
calculate()
method to return the reward
in a given situation.
from smartgrid.rewards import Reward
from smartgrid.rewards.argumentation.situation import parse_situation
class YourRewardNameHere(Reward):
def __init__(self):
super().__init__()
self.judge = judge
def calculate(self, world, agent):
situation = parse_situation(world, agent)
reward = self.judge.judge(situation, decision='moral')
return reward
You may then use this class when instantiating a SmartGrid
.
The judge
refers to the variable defined above; note that the decision
when judging must be the same as when defining the arguments!
In the existing argumentation-based reward functions, we encapsulate the AFDM
creation in a private _create_afdm()
method in each of the Rewards classes,
and we use decision
as a class attribute so that both arguments creation
and judgment can rely on the same value. This is however not mandatory: as long
as the Reward has access to a judging agent to perform the judgment, it will
work.