{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Visualizing results\n",
    "\n",
    "This notebook shows how to run simulations with different algorithms,\n",
    "collect results, and visualize them on several plots to compare the algorithms'\n",
    "performances.\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Setup environment\n",
    "\n",
    "First, we create the environment in which the simulations will take place.\n",
    "We simply use the \"basic\" (default) environment, to simplify.\n",
    "Because this document is executed in the `docs/` folder, we need to change the\n",
    "current working directory to import the data files correctly; on a typical\n",
    "setup, only the following lines are necessary:\n",
    "\n",
    "```python\n",
    "from smartgrid import make_basic_smartgrid\n",
    "from algorithms.qsom import QSOM\n",
    "from algorithms.naive import RandomModel\n",
    "\n",
    "env = make_basic_smartgrid()\n",
    "```\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "import os  # Only to change/get the Current Working Dir!\n",
    "# Remember previous working dir, and change to project root\n",
    "# (necessary to import data files).\n",
    "previous_cwd = os.getcwd()\n",
    "os.chdir('../..')\n",
    "\n",
    "# The only important lines here!\n",
    "# We do not specify the maximum number of time steps, as we will enforce this\n",
    "# directly in the interaction loop when running simulations.\n",
    "from smartgrid import make_basic_smartgrid\n",
    "from algorithms.qsom import QSOM\n",
    "from algorithms.naive import RandomModel\n",
    "env = make_basic_smartgrid()\n",
    "\n",
    "# Revert to previous working dir\n",
    "os.chdir(previous_cwd)\n"
   ],
   "metadata": {
    "collapsed": false,
    "tags": [
     "remove-input"
    ]
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "We will execute simulations for 50 time steps, to limit the required time to render this notebook and build the documentation. Typically, 10,000 time steps are used for a simulation, taking about ~ 1 or 2 hours, depending on the machine. Simulations with 100 or 1,000 time steps can be a nice compromise between number of data points and execution time.\n",
    "\n",
    "We also set the seed to some pre-defined constant, to ensure reproducibility of results."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "max_step = 50\n",
    "seed = 1234"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Run simulations and collect results\n",
    "\n",
    "Now, we run a simulation using a first algorithm, QSOM.\n",
    "This algorithm is already provided in this project, so using it is straightforward.\n",
    "\n",
    "We collect rewards received by learning agents at each time step in an array for visualizing later. To simplify the plotting process, we will only memorize the average of rewards at each time step; more complex plots can be created by logging more details, such as all rewards received by all learning agents at each time step.\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "data_qsom = []\n",
    "model_qsom = QSOM(env)\n",
    "\n",
    "obs = env.reset(seed=seed)\n",
    "for step in range(max_step):\n",
    "    actions = model_qsom.forward(obs)\n",
    "    obs, rewards, _, _, _ = env.step(actions)\n",
    "    model_qsom.backward(obs, rewards)\n",
    "    # Memorize the average of rewards received at this time step\n",
    "    data_qsom.append(np.mean(rewards))\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "To compare QSOM with another algorithm, we will run a simulation with a Random model, taking actions completely randomly. This algorithm is also already provided in this project, allowing us to focus on running simulations and plotting results here. For more interesting comparisons, other learning algorithms should be implemented, or re-used from existing projects, such as [Stable Baselines](https://stable-baselines3.readthedocs.io/en/master/).\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "data_random = []\n",
    "model_random = RandomModel(env, {})\n",
    "\n",
    "obs = env.reset(seed=seed)\n",
    "for step in range(max_step):\n",
    "    actions = model_random.forward(obs)\n",
    "    obs, rewards, _, _, _ = env.step(actions)\n",
    "    model_random.backward(obs, rewards)\n",
    "    # Memorize the average of rewards received at this time step\n",
    "    data_random.append(np.mean(rewards))\n"
   ],
   "metadata": {
    "collapsed": false,
    "is_executing": true
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Plot results\n",
    "\n",
    "We now have the results (average of rewards per time step) for two simulations using two different algorithms.\n",
    "We can visualize them using a plotting library; to keep it simple, we will use [Matplotlib](https://matplotlib.org/) here, but any library can be used, such as [seaborn](https://seaborn.pydata.org/), [bokeh](https://bokeh.org/), ...\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "x = list(range(max_step))\n",
    "plt.plot(x, data_qsom, color='orange', label='QSOM')\n",
    "plt.plot(x, data_random, color='blue', label='Random')\n",
    "plt.grid()\n",
    "plt.xlabel('Time step')\n",
    "plt.ylabel('Average reward')\n",
    "plt.title('Comparison between mean of received rewards at each time step')\n",
    "plt.show()\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Note that, with only a few time steps, it is difficult to compare these algorithms. We emphasize again that, in most cases, simulations should run for many more steps, e.g., 10,000.\n",
    "\n",
    "Other plots can be created, e.g., collecting all agents' rewards and showing the confidence interval of them per time step; running several simulations and plotting the distribution of scores, i.e., average of average rewards per learning agent per time step, etc.\n"
   ],
   "metadata": {
    "collapsed": false
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}