Polara is designed to automate the process of model prototyping and evaluation as much as possible. As a part of it,
By default, it implements several conventional evaluation scenarios fully controlled by a set of configurational parameters. A user does not have to worry about anything beyond just setting the appropriate values of these parameters (a complete list of them can be obtained by calling the get_configuration
method of a RecommenderData
instance). As the result an input preferences data will be automatically pre-processed and converted into a convenient representation with an independent access to the training and evaluation parts.
This default behaviour, however, can be flexibly manipulated to run custom scenarios with externally provided evaluation data. This flexibility is achieved with the help of the special set_test_data
method implemented in the RecommenderData
class. This guide demonstrates how to use the configuration parameters in conjunction with this method to cover various customizations.
We will use Movielens-1M data for experimentation. The data will be divided into several parts:
The last two datasets serve as an imitation of external data sources, which are not a part of initial data model.
Also note, that holdout dataset contains items of both known and unseen (warm-start) users.
In [1]:
import numpy as np
from polara.datasets.movielens import get_movielens_data
In [2]:
seed = 0
def random_state(seed=seed): # to fix random state in experiments
return np.random.RandomState(seed=seed)
Downloading the data (alternatively you can provide a path to the local copy of the data as an argument to the function):
In [3]:
data = get_movielens_data()
Sampling 5% of the preferences data to form the holdout dataset:
In [4]:
data_sampled = data.sample(frac=0.95, random_state=random_state()).sort_values('userid')
In [5]:
holdout = data[~data.index.isin(data_sampled.index)]
Make 20% of all users unseen during the training phase:
In [6]:
users, unseen_users = np.split(data_sampled.userid.drop_duplicates().values,
[int(0.8*data_sampled.userid.nunique()),])
In [7]:
observations = data_sampled.query('userid in @users')
This is the simplest case, which allows to completely ignore evaluation phase. This sets an initial configuration for all further evaluation scenarios.
In [8]:
from polara.recommender.data import RecommenderData
from polara.recommender.models import SVDModel
In [9]:
data_model = RecommenderData(observations, 'userid', 'movieid', 'rating', seed=seed)
We will use prepare_training_only
method instead of the general prepare
:
In [10]:
data_model.prepare_training_only()
This sets all the required configuration parameters and transform the data accordingly.
Let's check that test data is empty,
In [11]:
data_model.test
Out[11]:
and the whole input was used as a training part:
In [12]:
data_model.training.shape
Out[12]:
In [13]:
observations.shape
Out[13]:
Internally, the data was transformed to have a certain numeric representation, which Polara relies on:
In [14]:
data_model.training.head()
Out[14]:
In [15]:
observations.head()
Out[15]:
build_index
attribute to False
before data processing (not recommended).
You can easily build a recommendation model now:
In [16]:
svd = SVDModel(data_model)
svd.build()
However, the recommendations cannot be generated, as there is no testing data. The following function call will raise an error:
svd.get_recommendations()
In the competitions like Netflix Prize you may be provided with a dedicated evaluation dataset (a probe set), which contains hidden preferences information about known users. In terms of the Polara syntax, this is a holdout set.
You can assign this holdout set to the data model by calling the set_test_data
method as follows:
In [17]:
data_model.set_test_data(holdout=holdout, warm_start=False)
Mind the warm_start=False
argument, which tells Polara to work only with known users. If some users from holdout are not a part of the training data, they will be filtered out and the corresponding notification message will be displayed (you can turn it off by setting data_model.verbose=False
). In this example 1129 users were filtered out, as initially the holdout set contained both known and unknown users.
Note, that items not present in the training data are also filtered. This behavior can be changed by setting data_model.ensure_consistency=False
(not recommended).
In [18]:
data_model.test.holdout.userid.nunique()
Out[18]:
The recommendation model can now be evaluated:
In [19]:
svd.switch_positive = 4 # treat ratings below 4 as negative feedback
svd.evaluate()
Out[19]:
In [20]:
data_model.test.holdout.query('rating>=4').shape[0] # maximum number of possible true_positive hits
Out[20]:
In [21]:
svd.evaluate('relevance')
Out[21]:
Polara also allows to handle cases, where you don't have a probe set and the task is to simply generate recommendations for a list of selected test users. The evaluation in that case is to be performed externally.
Let's randomly pick a few test users from all known users (i.e. those who are present in the training data):
In [22]:
test_users = random_state().choice(users, size=5, replace=False)
test_users
Out[22]:
You can provide this list by setting the test_users
argument of the set_test_data
method:
In [23]:
data_model.set_test_data(test_users=test_users, warm_start=False)
Recommendations in that case will have a corresponding shape of number of test users
x top-n
(by default top-10).
In [24]:
svd.get_recommendations().shape
Out[24]:
In [25]:
print((len(test_users), svd.topk))
As the holdout was not provided, it's previous state is cleared from the data model:
In [26]:
print(data_model.test.holdout)
The order of test user id's in the recommendations matrix may not correspond to their order in the test_users
list. The true order can be obtained via index
attribute - the users are sorted in ascending order by their internal index. This order is used to construct the recommendations matrix.
In [27]:
data_model.index.userid.training.query('old in @test_users')
Out[27]:
In [28]:
test_users
Out[28]:
Note, that there's no need to provide testset argument in the case of known users. All the information about test users' preferences is assumed to be fully present in the training data and the following function call will intentionally raise an error:
data_model.set_test_data(testset=some_test_data, warm_start=False)
If the testset contains new (unseen) information, you should consider the warm-start scenarios, described below.
Let's form a dataset with new users and their preferences:
In [29]:
unseen_data = data_sampled.query('userid in @unseen_users')
unseen_data.shape
Out[29]:
In [30]:
assert unseen_data.userid.nunique() == len(unseen_users)
print(len(unseen_users))
None of these users are present in the training:
In [31]:
data_model.index.userid.training.old.isin(unseen_users).any()
Out[31]:
In order to generate recommendations for these users, we assign the dataset of their preferences as a testset (mind the warm_start argument value):
In [32]:
data_model.set_test_data(testset=unseen_data, warm_start=True)
As we use an SVD-based model, there is no need for any modifications to generate recommendations - it uses the same analytical formula for both standard and warm-start regime:
In [33]:
svd.get_recommendations().shape
Out[33]:
Note, that internally the unseen_data
dataset is transformed: users are reindexed starting from 0 and items are reindexed based on the current item index of the training set.
In [34]:
data_model.test.testset.head()
Out[34]:
In [35]:
data_model.index.userid.test.head() # test user index mapping, new index starts from 0
Out[35]:
In [36]:
data_model.index.itemid.head() # item index mapping
Out[36]:
In [37]:
unseen_data.head()
Out[37]:
This is the most complete scenario. We generate recommendations based on the test users' preferences, encoded in the testset
, and evaluate them against the holdout
. You should use this setup only when the Polara's built-in warm-start evaluation pipeline (turned on by data_model.warm_start=True
) is not sufficient, , e.g. when the preferences data is fixed and provided externally.
In [38]:
data_model.set_test_data(testset=unseen_data, holdout=holdout, warm_start=True)
As previously, all unrelated users and items are removed from the datasets and the remaining entities are reindexed.
In [39]:
data_model.test.testset.head(10)
Out[39]:
In [40]:
data_model.test.holdout.head(10)
Out[40]:
In [41]:
svd.switch_positive = 4
svd.evaluate()
Out[41]:
In [42]:
data_model.test.holdout.query('rating>=4').shape[0] # maximum number of possible true positives
Out[42]:
In [43]:
svd.evaluate('relevance')
Out[43]:
In [ ]: