Copyright 2019 The RecSim Authors.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Having familiarized ourselves with the overall structure of RecSim and how environments come together, we now turn to the final piece of the puzzle -- agent development. In this tutorial, we aim to cover the following topics:
To illustrate RecSim's agent API, we will the develop a simple bandit agent for RecSim's interest exploration environment.
The interest exploration representes a clustered bandit problem: the world consists of some very large number of documents, which cluster into topics (this is a hard clustering -- one topic per document). We further posit that users also cluster into types.
A user's affinity towards a document is a sum of the document's production quality plus the user's (user type's) affinity to the topic. This naturally creates a situation where a myopic agent that ranks documents by predicted click rate will favor types with high production value, as they have a high apriori probability of getting clicked across all user types. This leads the agent to ignoring to explore niche interests, producing a suboptimal policy. Hence the need for active exploration.
For the purposes of exposition, we will define the agent method by method, which we will then assemble in a class.
We now instantiate an environment to illustrate the various data types it produces and consumes, and how they are handled within an agent.
In [0]:
# @title Install
!pip install --upgrade --no-cache-dir recsim
In [0]:
# @title Imports
# Generic imports
import functools
from gym import spaces
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# RecSim imports
from recsim import agent
from recsim import document
from recsim import user
from recsim.choice_model import MultinomialLogitChoiceModel
from recsim.simulator import environment
from recsim.simulator import recsim_gym
from recsim.simulator import runner_lib
In [0]:
from recsim.environments import interest_exploration
Since we're not about to do anything fancy with this environment, we will initialize it with the provided create_environment function (further details on this here).
In [0]:
env_config = {'slate_size': 2,
'seed': 0,
'num_candidates': 15,
'resample_documents': True}
ie_environment = interest_exploration.create_environment(env_config)
At the start of each session, the simulator resets the environment, which triggers a resampling of the user. The reset call generates our initial observation.
In [0]:
initial_observation = ie_environment.reset()
A RecSim observation is a dictionary with 3 keys:
Note that this environment does not implement user observable features, so that field would be empty at all times.
In [0]:
print('User Observable Features')
print(initial_observation['user'])
print('User Response')
print(initial_observation['response'])
print('Document Observable Features')
for doc_id, doc_features in initial_observation['doc'].items():
print('ID:', doc_id, 'features:', doc_features)
We are thus presented with a corpus of 15 documents (num_candidates), each represented by their topic and their production quality score. Note, though, that the user's affinity is not an observable quantity.
The observation format specification can be accessed as a feature of the environment in the form of an OpenAI gym space. It is also provided to the agent at initialization time.
In [0]:
print('Document observation space')
for key, space in ie_environment.observation_space['doc'].spaces.items():
print(key, ':', space)
print('Response observation space')
print(ie_environment.observation_space['response'])
print('User observation space')
print(ie_environment.observation_space['user'])
In [0]:
slate = [0, 1]
for slate_doc in slate:
print(list(initial_observation['doc'].items())[slate_doc])
The action space gym specification is also provided by the environment.
In [0]:
ie_environment.action_space
Out[0]:
When the first slate is available, the simulator will run the environment and generate a new observation, along with a reward for the agent.
In [0]:
observation, reward, done, _ = ie_environment.step(slate)
The main job of the agent is to produce a valid slate for each step of the simulation.
In [0]:
from recsim.agent import AbstractEpisodicRecommenderAgent
A RecSim agent inherits from AbstractEpisodicRecommenderAgent. Required arguments (which RecSim will pass to the agent at simulation time) for the agent's init are the observation_space and action_space. We can use them to validate whether the environment meets the preconditions for the agent's operation.
In [0]:
class StaticAgent(AbstractEpisodicRecommenderAgent):
def __init__(self, observation_space, action_space):
# Check if document corpus is large enough.
if len(observation_space['doc'].spaces) < len(action_space.nvec):
raise RuntimeError('Slate size larger than size of the corpus.')
super(StaticAgent, self).__init__(action_space)
def step(self, reward, observation):
print(observation)
return list(range(self._slate_size))
This agent will statically recommend the first K documents of the corpus. For reasons that will become clear soon, we'll also have it print the observation.
We can now run it in RecSim using runner_lib (See tutorial for details).
In [0]:
def create_agent(sess, environment, eval_mode, summary_writer=None):
return StaticAgent(environment.observation_space, environment.action_space)
tmp_base_dir = '/tmp/recsim/'
runner = runner_lib.EvalRunner(
base_dir=tmp_base_dir,
create_agent_fn=create_agent,
env=ie_environment,
max_eval_episodes=1,
max_steps_per_episode=5,
test_mode=True)
# We won't run this, but we totally could
# runner.run_experiment()
Now that we've gotten a basic agent off the ground, we might want to set our aims a little higher. That is, let's see if we can build an agent that actually does something useful.
The way this problem is set up, a natural heuristic presents itself. We can run a bandit algorithm to reveal the average engagement of a user with each cluster of documents. That is, each cluster becomes an arm. Once the algorithm has chosen a cluster, we serve take the highest quality video from that cluster. This is a metaphor for a situation that occurs often in recommender systems that serve as a front end to multiple (sub-)products: within each session, the user will interact with the recommender with some intent in mind, that is, to realize some task that can be fulfilled by one of the possible sub-products. Sometimes, the user will issue an explicit query (e.g., enter search terms), which effectively makes that intent observable up ot query interpretation uncertainty. Most often, however, the intent will be latent -- the user will reveal it indirectly by chosing among a set of items from the slate. We assume that had the intent been observable, a product-specific policy would be available to fulfill it.
This set-up captures some typical features of practical recommender systems -- they tend to very hierarchical, often very heuristic due to the complexity of the environment they operate in, and also very idiosyncratic to the task at hand. For this reason, RecSim's approach to agent engineering is very modular. Instead of providing a wide array of agents, we provide an easily extendable set of agent building blocks, called Agent Layers, which could be combined into hierarchies to create more complex agents.
A hierarchical agent layer does not materialize a slate of documents, but relies on one or more base agents to do so. The hierarchical agent architecture in RecSim can roughly be described follows:
Hierarchical layers are recursively stackable in a fashion similar to Keras layers. Hierarchical layers are defined by their pre- and post-processing functions and can play many roles dependinghow these are implemented. For example, a layer can beused as a pure feature injector — it can extract some feature from the (history of) observations and pass it to the base agent, while keeping the post-processing function vacuous. This allows decoupling of feature- and agent-engineering. Various regularizers can be implemented in a similar fashion by modifying the reward. Layers may also be stateful and dynamic, as the pre- or post-processing functions may implement parameter updates or learning mechanisms.
We will not discuss how to implement these layers here (the reader is referred to examples in the layers/ directory), rather, we will show their usage and benefits.
Recall that the Interest Exploration provides clicks as feedback, but does not keep track of cumulative click counts or impression counts. Since maintaining such statistics is generally useful, we provide an agent layer that does exactly that. That is, it monitors the stream of responses and retains the number of clicks and impressions from each cluster. The precondition is that the response space has a key 'click', as well as 'cluster_id'. If this is met, than the layer can be used with any environment/agent. Let's see how this works.
In [0]:
from recsim.agents.layers.cluster_click_statistics import ClusterClickStatsLayer
A hierarchical agent layer is instantiated in a smilar way to usual agents, except that it takes in a constructor for a base agent, that is, an agent whose abstract action it can interpret. In the case of cluster click stats, it will not do any post-processing of the abstract action, that is, it simply relays the action of the base agent to the environment. This implies that the base agent will need to provide a full slate.
Once instantiated, the cluster click stats layer will inject a sufficient statistic to the base agent's observation space containing clicks and impressions. Thus, the combination of both will behave like as if the base agent had an additional field in its observation space. We showcase this using our StaticAgent.
In [0]:
static_agent = StaticAgent(ie_environment.observation_space,
ie_environment.action_space)
static_agent.step(reward, observation)
Out[0]:
In [0]:
cluster_static_agent = ClusterClickStatsLayer(StaticAgent,
ie_environment.observation_space,
ie_environment.action_space)
cluster_static_agent.step(reward, observation)
Out[0]:
Observe how the 'user' field of the observation dictionary (as printed from within the static agent's step function) now has a new key 'sufficient_statistics', whereas the old user observation (which is vacuous) went under the 'raw_observation' key. This is done to avoid naming conflicts.
The ClusterClickStats layer takes care of computing the necessary sufficient statistics for exploration. To implement the actual bandit policy, RecSim offers an abstract bandit layer implementation. The AbstractClickBandit takes as input a list of base agents, which it treats as arms. It will then utilize one of a a few implemented bandit policies (UCB1, KL-UCB, ThompsonSampling) to mix the policies in a way that achieves sub-linear regret relative to the best policy (which is apriori unknown), subject to certain assumptions about the environment.
In [0]:
from recsim.agents.layers.abstract_click_bandit import AbstractClickBanditLayer
To instantiate an abstract bandit, we must present a list of base agents. In our case, we will have one base agent for each cluster. That agent simply retrieves the documents of that cluster from the corpus and sorts them according to perceived quality.
In [0]:
class GreedyClusterAgent(agent.AbstractEpisodicRecommenderAgent):
"""Simple agent sorting all documents of a topic according to quality."""
def __init__(self, observation_space, action_space, cluster_id, **kwargs):
del observation_space
super(GreedyClusterAgent, self).__init__(action_space)
self._cluster_id = cluster_id
def step(self, reward, observation):
del reward
my_docs = []
my_doc_quality = []
for i, doc in enumerate(observation['doc'].values()):
if doc['cluster_id'] == self._cluster_id:
my_docs.append(i)
my_doc_quality.append(doc['quality'])
if not bool(my_docs):
return []
sorted_indices = np.argsort(my_doc_quality)[::-1]
return list(np.array(my_docs)[sorted_indices])
We will now instantiate one GreedyClusterAgent for each cluster.
In [0]:
num_topics = list(ie_environment.observation_space.spaces['doc']
.spaces.values())[0].spaces['cluster_id'].n
base_agent_ctors = [
functools.partial(GreedyClusterAgent, cluster_id=i)
for i in range(num_topics)
]
We can now instantiate our cluster bandit as a combination of ClusterClickStats, AbstractClickBandit, and GreedyClusterAgent:
In [0]:
bandit_ctor = functools.partial(AbstractClickBanditLayer,
arm_base_agent_ctors=base_agent_ctors)
cluster_bandit = ClusterClickStatsLayer(bandit_ctor,
ie_environment.observation_space,
ie_environment.action_space)
Our ClusterBandit is ready to use!
In [0]:
observation0 = ie_environment.reset()
slate = cluster_bandit.begin_episode(observation0)
print("Cluster bandit slate 0:")
doc_list = list(observation0['doc'].values())
for doc_position in slate:
print(doc_list[doc_position])