Using Ethical Gardeners with Custom Algorithms ============================================== This tutorial explains how to integrate your own reinforcement learning algorithms with the Ethical Gardeners environment. Compatibility with Custom Algorithms ------------------------------------ The Ethical Gardeners environment is designed to work with various reinforcement learning algorithms, including custom ones. While we provide built-in support for `Stable Baselines 3 `__, you can use your own algorithms by following these guidelines. Requirements for Custom Algorithms ---------------------------------- To use your algorithm with the utility functions provided by Ethical Gardeners: 1. For the :py:func:`~ethicalgardeners.algorithms.train` function, your algorithm should have: - a ``learn()`` method that accepts a ``total_timesteps`` parameter - a ``save()`` method to persist the trained model 2. For the :py:func:`~ethicalgardeners.algorithms.evaluate` and the :py:func:`~ethicalgardeners.algorithms.predict_action` functions, your algorithm should have: - a ``predict()`` method that takes observations and if you want action masks and returns actions Using the Core Utility Functions -------------------------------- The following code snippet is a minimal example of how to use the training, evaluation, and prediction functions with a custom algorithm. The example uses the `MaskablePPO algorithm from sb3-contrib `__ for an example of an algorithm that supports action masking and the `DQN algorithm from Stable Baselines 3 `__ for an example of an algorithm that does not support action masking. For training, the :py:func:`~ethicalgardeners.algorithms.train` function accepts a model so the model must be instantiated beforehand. In the example, we instantiate the model with the environment and a policy. The environment is either a default Ethical Gardeners environment made with the :py:func:`~ethicalgardeners.main.make_env` function or a vectorized one with multiple environments. For training, evaluation and prediction, you must say whether your algorithm supports action masking or not. If it does, the ``needs_action_mask`` parameter should be set to ``True``. If it does not, it should be set to ``False``. .. literalinclude:: /examples/train_evaluate_predict.py :language: python :caption: train_evaluate_predict.py :name: train_evaluate_predict :encoding: utf-8