Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Running RecSim

In this Colab we explore how to train and evaulate an agent within RecSim using the provided environments and clarify some basic concepts along the way.

RecSim at a Glance

RecSim is a configurable platform for simulating a recommendation system environment in which a recommender agent interacts with a corpus of documents (or recommendable items) and a set of users, in a natural but abstract fashion, to support the development of new recommendation algorithms. At its core, a RecSim simulation consists of running the following event loop for some fixed number of sessions (episodes):

for episode in [1,...,number_of_episodes]:
  user = sample_user()
  recommended_slate = null
  while session_not_over:
    user_response = user_responds_to_recommendation(recommended_slate)
    available_documents = sample_documents_from_database()
    recommended_slate = agent_step(available_documents, user_response)

The document database (document model), user model, and recommender agent each have various internal components, and we will discuss how to design and implement them in later colabs (Developing an Environment, Developing an Agent). For now, we will see how to set up one of the ready-made environments that ship with RecSim in order to run a simulation.



In [0]:

    
# @title Install
!pip install --upgrade --no-cache-dir recsim
!pip install -q tf-nightly-2.0-preview
# Load the TensorBoard notebook extension
%load_ext tensorboard



In [0]:

    
#@title Importing generics
import numpy as np
import tensorflow as tf

In RecSim, a user model and a document model are packaged together within an OpenAI Gym-style environment. In this tutorial, we will use the "Interest Evolution" environment used in Ie et al., as well as a full Slate-Q agent also described therein. Both come ready to use with RecSim. We import the environment from recsim.environments. Agents are found in recsim.agents. Finally, we need to import runner_lib from recsim.simulator, which executes the loop outlined above.



In [0]:

    
#@title Importing RecSim components 
from recsim.environments import interest_evolution
from recsim.agents import full_slate_q_agent
from recsim.simulator import runner_lib

Creating an Agent

Similarly to Dopamine, a RecSim experiment runner (simulator) consumes an environment creation function and an agent creation function. These functions are responsible for setting up the environment/agent based on external parameters. The interest evolution environment already comes with a creation function, so we will limit our attention to the agent.

A create_agent function takes a tensorflow session, environment object, a training/eval flag and (optionally) a Tensorflow summary writer, which are passed to the agent for in-agent training statistics in Tensorboard (more on that below). In the case of full Slate-Q, we just need to extract the action and observation spaces from the environment and pass them to the agent constructor.



In [0]:

    
def create_agent(sess, environment, eval_mode, summary_writer=None):
  kwargs = {
      'observation_space': environment.observation_space,
      'action_space': environment.action_space,
      'summary_writer': summary_writer,
      'eval_mode': eval_mode,
  }
  return full_slate_q_agent.FullSlateQAgent(sess, **kwargs)

Training and Evaluating the Agent in a Simulation Loop

Before we run the agent, we need to set up a few environment parameters. These are the bare minimum:

slate_size sets the size of the set of elements presented to the user;
num_candidates specifies the number of documents present in the document database at any given time;
resample_documents specifies whether the set of candidates should be resampled between time steps according to the document distribution (more on this in later notebooks).
finally, we set the random seed.



In [0]:

    
seed = 0
np.random.seed(seed)
env_config = {
  'num_candidates': 10,
  'slate_size': 2,
  'resample_documents': True,
  'seed': seed,
  }

Once we've created a dictionary of these, we can run training, specifying additionally the number of training steps, number of iterations and a directory in which to checkpoint the agent.



In [0]:

    
tmp_base_dir = '/tmp/recsim/'
runner = runner_lib.TrainRunner(
    base_dir=tmp_base_dir,
    create_agent_fn=create_agent,
    env=interest_evolution.create_environment(env_config),
    episode_log_file="",
    max_training_steps=50,
    num_iterations=10)
runner.run_experiment()

After training is finished, we can run a separate simulation to evaluate the agent's performance.



In [0]:

    
runner = runner_lib.EvalRunner(
      base_dir=tmp_base_dir,
      create_agent_fn=create_agent,
      env=interest_evolution.create_environment(env_config),
      max_eval_episodes=5,
      test_mode=True)
  runner.run_experiment()

The cumulative reward across the training episodes will be stored in base_dir/eval/. However, RecSim also exports a more detailed set of summaries, including environment specific ones, that can be visualized in a Tensorboard.



In [0]:

    
#@title Tensorboard
%tensorboard --logdir=/tmp/recsim/

References

SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets. IJCAI 2019: 2592-2599