This tutorial describes the process of visualizing and replaying the results of Flow experiments run using RL. The process of visualizing results breaks down into two main components:
reward plotting
policy replay
Furthermore, visualization is different depending on whether your experiments were run using rllab or RLlib. Accordingly, this tutorial is divided in two parts (one for rllab and one for RLlib). Note that this tutorial only talks about visualization using sumo, and not other simulators like Aimsun.
An essential step in evaluating the effectiveness and training progress of RL agents is visualization of reward. rllab includes a tool to plot the average cumulative reward per rollout against iteration number to show training progress. This "reward plot" can be generated for just one experiment or many. The tool to be called is rllab's frontend.py, which is inside the directory rllab/viskit/ (assuming a user is already inside the directory rllab-multiagent).
frontend.py requires only one command-line input: the path to the result directory that a user wants to visualize. The directory should contain a progress.csv and params.json file—pickle files containing per-iteration results are not necessary. An example call to frontend.py is below. Click on the link to http://localhost:5000 to view reward over time.
In [ ]:
! python ../../../rllab/viskit/frontend.py /path/to/result/directory
Flow includes a tool for visualizing a trained policy in its environment using SUMO's GUI. This enables more granular analysis of policies beyond their accrued reward, which in turn allows users to tweak actions, observations, and rewards in order to produce desired behavior. The visualizer also generates plots of observations and a plot of reward over the course of the rollout. The tool to be called is visualizer_rllab.py within flow/visualize (assuming a user is already inside the parent directory flow).
visualizer_rllab.py requires one command-line input and has three additional optional arguments. The required input is the path to the pickle file to be visualized (this is usually within an rllab result directory). The optional inputs are:
--num_rollouts, the number of rollouts to be visualized. The default value is 100. This argument takes integer input.--plotname, the name of the plot generated by the visualizer. The default value is traffic_plot. This argument takes string input.--gen_emission, Specifies whether to generate an emission file from the simulation. This argument is a flag and takes no input.An example call to visualizer_rllab.py is below.
In [ ]:
! python ../../flow/visualize/visualizer_rllab.py /path/to/result.pkl --num_rollouts 1 --plotname plot_test --gen_emsision
Similarly to how rllab handles reward plotting, RLlib supports reward visualization over the period of training using tensorboard. tensorboard takes one command-line input, --logdir, which is an rllib result directory (usually located within an experiment directory inside your ray_results directory). An example function call is below.
In [ ]:
! tensorboard --logdir /ray_results/experiment_dir/result/directory
If you do not wish to use tensorboard, you can also use the flow/visualize/plot_ray_results.py file. It takes as arguments the path to the progress.csv file located inside your experiment results directory, and the name(s) of the column(s) to plot. If you do not know what the name of the columns are, simply do not put any and a list of all available columns will be displayed to you.
Example usage:
In [ ]:
! plot_ray_results.py /ray_results/experiment_dir/progress.csv training/return-average training/return-min
The tool to replay a policy trained using RLlib is located in flow/visualize/visualizer_rllib.py. It takes as argument, first the path to the experiment results, and second the number of the checkpoint you wish to visualize.
There are other optional parameters which you can learn about by running visualizer_rllib.py --help.
In [ ]:
! python ../../flow/visualize/visualizer_rllib.py /ray_results/experiment_dir/result/directory 1
Any Flow experiment can output its results to a CSV file containing the contents of SUMO's built-in emission.xml files, specifying speed, position, time, fuel consumption, and many other metrics for all vehicles in a network over time.
This section describes how to generate those emission.csv files when replaying and analyzing a trained policy.
In [ ]:
# Calling the visualizer with the flag --emission_to_csv replays the policy and creates an emission file
! python ../../flow/visualize/visualizer_rllab.py path/to/result.pkl --gen_emission
The generated emission.csv is placed in the directory test_time_rollout/ inside the directory from which you've just run the visualizer. That emission file can be opened in Excel, loaded in Python and plotted, and more.
In [ ]:
# --emission_to_csv does the same as above
! python ../../flow/visualize/visualizer_rllib.py results/sample_checkpoint 1 --gen_emission
As in the rllab case, the emission.csv file can be found in test_time_rollout/ and used from there.
SUMO-only experiments can generate emission CSV files as well, based on an argument to the experiment.run method. run takes in arguments (num_runs, num_steps, rl_actions=None, convert_to_csv=False). To generate an emission.csv file, pass in convert_to_csv=True in the Python file running your SUMO experiment.