Using Polara for custom evaluation scenarios

Polara is designed to automate the process of model prototyping and evaluation as much as possible. As a part of it,

Polara follows a certain data management workflow, aimed at maintaining a consistent and predictable internal state.

By default, it implements several conventional evaluation scenarios fully controlled by a set of configurational parameters. A user does not have to worry about anything beyond just setting the appropriate values of these parameters (a complete list of them can be obtained by calling the get_configuration method of a RecommenderData instance). As the result an input preferences data will be automatically pre-processed and converted into a convenient representation with an independent access to the training and evaluation parts.

This default behaviour, however, can be flexibly manipulated to run custom scenarios with externally provided evaluation data. This flexibility is achieved with the help of the special set_test_data method implemented in the RecommenderData class. This guide demonstrates how to use the configuration parameters in conjunction with this method to cover various customizations.

Prepare data

We will use Movielens-1M data for experimentation. The data will be divided into several parts:

observations, used for training,
holdout, used for evaluating recommendations against the true preferences,
unseen data, used for warm-start scenarios, where test users with their preferences are not a part of training.

The last two datasets serve as an imitation of external data sources, which are not a part of initial data model.
Also note, that holdout dataset contains items of both known and unseen (warm-start) users.



In [1]:

    
import numpy as np
from polara.datasets.movielens import get_movielens_data



In [2]:

    
seed = 0
def random_state(seed=seed): # to fix random state in experiments
    return np.random.RandomState(seed=seed)

Downloading the data (alternatively you can provide a path to the local copy of the data as an argument to the function):



In [3]:

    
data = get_movielens_data()

Sampling 5% of the preferences data to form the holdout dataset:



In [4]:

    
data_sampled = data.sample(frac=0.95, random_state=random_state()).sort_values('userid')



In [5]:

    
holdout = data[~data.index.isin(data_sampled.index)]

Make 20% of all users unseen during the training phase:



In [6]:

    
users, unseen_users = np.split(data_sampled.userid.drop_duplicates().values,
                               [int(0.8*data_sampled.userid.nunique()),])



In [7]:

    
observations = data_sampled.query('userid in @users')

Scenario 0: building a recommender model without any evaluation

This is the simplest case, which allows to completely ignore evaluation phase. This sets an initial configuration for all further evaluation scenarios.



In [8]:

    
from polara.recommender.data import RecommenderData
from polara.recommender.models import SVDModel



In [9]:

    
data_model = RecommenderData(observations, 'userid', 'movieid', 'rating', seed=seed)

We will use prepare_training_only method instead of the general prepare:



In [10]:

    
data_model.prepare_training_only()









    



Preparing data...
Done.
There are 766928 events in the training and 0 events in the holdout.

This sets all the required configuration parameters and transform the data accordingly.

Let's check that test data is empty,



In [11]:

    
data_model.test









    Out[11]:





TestData(testset=None, holdout=None)

and the whole input was used as a training part:



In [12]:

    
data_model.training.shape









    Out[12]:





(766928, 3)



In [13]:

    
observations.shape









    Out[13]:





(766928, 3)

Internally, the data was transformed to have a certain numeric representation, which Polara relies on:



In [14]:

    
data_model.training.head()



In [15]:

    
observations.head()

The mapping between external and internal data representations is stored in the `data_model.index` attribute.

The transformation can be disabled by setting the build_index attribute to False before data processing (not recommended).

You can easily build a recommendation model now:



In [16]:

    
svd = SVDModel(data_model)
svd.build()









    



PureSVD training time: 0.128s

However, the recommendations cannot be generated, as there is no testing data. The following function call will raise an error:

svd.get_recommendations()

Scenario 1: evaluation with pre-specified holdout data for known users

In the competitions like Netflix Prize you may be provided with a dedicated evaluation dataset (a probe set), which contains hidden preferences information about known users. In terms of the Polara syntax, this is a holdout set.

You can assign this holdout set to the data model by calling the set_test_data method as follows:



In [17]:

    
data_model.set_test_data(holdout=holdout, warm_start=False)









    



6 unique movieid's within 6 holdout interactions were filtered. Reason: not in the training data.
1129 unique userid's within 9479 holdout interactions were filtered. Reason: not in the training data.

Mind the warm_start=False argument, which tells Polara to work only with known users. If some users from holdout are not a part of the training data, they will be filtered out and the corresponding notification message will be displayed (you can turn it off by setting data_model.verbose=False). In this example 1129 users were filtered out, as initially the holdout set contained both known and unknown users.

Note, that items not present in the training data are also filtered. This behavior can be changed by setting data_model.ensure_consistency=False (not recommended).



In [18]:

    
data_model.test.holdout.userid.nunique()









    Out[18]:





4484

The recommendation model can now be evaluated:



In [19]:

    
svd.switch_positive = 4 # treat ratings below 4 as negative feedback
svd.evaluate()









    Out[19]:





[Relevance(precision=0.4671998676068703, recall=0.18790260795025798, fallout=0.0587545652398142, specifity=0.7075255418074651, miss_rate=0.7362722359391265),
 Ranking(nDCG=0.16791941496631102, nDCL=0.07078245692187013),
 Experience(coverage=0.14883215643671918),
 Hits(true_positive=4443, false_positive=1131, true_negative=15870, false_negative=19081)]



In [20]:

    
data_model.test.holdout.query('rating>=4').shape[0] # maximum number of possible true_positive hits









    Out[20]:





23524



In [21]:

    
svd.evaluate('relevance')









    Out[21]:





Relevance(precision=0.4671998676068703, recall=0.18790260795025798, fallout=0.0587545652398142, specifity=0.7075255418074651, miss_rate=0.7362722359391265)

Scenario 2: see recommendations for selected known users without evaluation

Polara also allows to handle cases, where you don't have a probe set and the task is to simply generate recommendations for a list of selected test users. The evaluation in that case is to be performed externally.

Let's randomly pick a few test users from all known users (i.e. those who are present in the training data):



In [22]:

    
test_users = random_state().choice(users, size=5, replace=False)
test_users









    Out[22]:





array([4138,  776, 4747, 1966, 4423], dtype=int64)

You can provide this list by setting the test_users argument of the set_test_data method:



In [23]:

    
data_model.set_test_data(test_users=test_users, warm_start=False)

Recommendations in that case will have a corresponding shape of number of test users x top-n (by default top-10).



In [24]:

    
svd.get_recommendations().shape









    Out[24]:





(5, 10)



In [25]:

    
print((len(test_users), svd.topk))

As the holdout was not provided, it's previous state is cleared from the data model:



In [26]:

    
print(data_model.test.holdout)









    



None

The order of test user id's in the recommendations matrix may not correspond to their order in the test_users list. The true order can be obtained via index attribute - the users are sorted in ascending order by their internal index. This order is used to construct the recommendations matrix.



In [27]:

    
data_model.index.userid.training.query('old in @test_users')



In [28]:

    
test_users









    Out[28]:





array([4138,  776, 4747, 1966, 4423], dtype=int64)

Note, that there's no need to provide testset argument in the case of known users. All the information about test users' preferences is assumed to be fully present in the training data and the following function call will intentionally raise an error:

data_model.set_test_data(testset=some_test_data, warm_start=False)

If the testset contains new (unseen) information, you should consider the warm-start scenarios, described below.

Scenario 3: see recommendations for unseen users without evaluation

Let's form a dataset with new users and their preferences:



In [29]:

    
unseen_data = data_sampled.query('userid in @unseen_users')
unseen_data.shape









    Out[29]:





(183271, 3)



In [30]:

    
assert unseen_data.userid.nunique() == len(unseen_users)
print(len(unseen_users))

None of these users are present in the training:



In [31]:

    
data_model.index.userid.training.old.isin(unseen_users).any()









    Out[31]:





False

In order to generate recommendations for these users, we assign the dataset of their preferences as a testset (mind the warm_start argument value):



In [32]:

    
data_model.set_test_data(testset=unseen_data, warm_start=True)









    



18 unique movieid's within 26 testset interactions were filtered. Reason: not in the training data.

As we use an SVD-based model, there is no need for any modifications to generate recommendations - it uses the same analytical formula for both standard and warm-start regime:



In [33]:

    
svd.get_recommendations().shape









    Out[33]:





(1208, 10)

Note, that internally the unseen_data dataset is transformed: users are reindexed starting from 0 and items are reindexed based on the current item index of the training set.



In [34]:

    
data_model.test.testset.head()



In [35]:

    
data_model.index.userid.test.head() # test user index mapping, new index starts from 0



In [36]:

    
data_model.index.itemid.head() # item index mapping



In [37]:

    
unseen_data.head()

Scenario 4: evaluate recommendations for unseen users with external holdout data

This is the most complete scenario. We generate recommendations based on the test users' preferences, encoded in the testset, and evaluate them against the holdout. You should use this setup only when the Polara's built-in warm-start evaluation pipeline (turned on by data_model.warm_start=True ) is not sufficient, , e.g. when the preferences data is fixed and provided externally.



In [38]:

    
data_model.set_test_data(testset=unseen_data, holdout=holdout, warm_start=True)









    



18 unique movieid's within 26 testset interactions were filtered. Reason: not in the training data.
6 unique movieid's within 6 holdout interactions were filtered. Reason: not in the training data.
4484 userid's were filtered out from holdout. Reason: inconsistent with testset.
79 userid's were filtered out from testset. Reason: inconsistent with holdout.

As previously, all unrelated users and items are removed from the datasets and the remaining entities are reindexed.



In [39]:

    
data_model.test.testset.head(10)



In [40]:

    
data_model.test.holdout.head(10)



In [41]:

    
svd.switch_positive = 4
svd.evaluate()









    Out[41]:





[Relevance(precision=0.48771352650892064, recall=0.1962177724147278, fallout=0.05871125477406844, specifity=0.6808813050133541, miss_rate=0.741780456106087),
 Ranking(nDCG=0.17149113058615703, nDCL=0.06967069219097612),
 Experience(coverage=0.10999456816947312),
 Hits(true_positive=1063, false_positive=245, true_negative=3505, false_negative=4666)]



In [42]:

    
data_model.test.holdout.query('rating>=4').shape[0] # maximum number of possible true positives









    Out[42]:





5729



In [43]:

    
svd.evaluate('relevance')









    Out[43]:





Relevance(precision=0.48771352650892064, recall=0.1962177724147278, fallout=0.05871125477406844, specifity=0.6808813050133541, miss_rate=0.741780456106087)



In [ ]:

	old	new
775	776	775
1965	1966	1965
4137	4138	4137
4422	4423	4422
4746	4747	4746

	movieid	rating
807503	707	3
807519	831	5
807532	2542	5
807529	2514	4
807477	1051	4

	old	new
0	4833	0
1	4834	1
2	4835	2
3	4836	3
4	4837	4

	userid	movieid	rating
807503	4833	750	3
807485	4833	2555	1
807482	4833	3157	4
807480	4833	1197	5
807491	4833	24	3

	movieid	rating
807503	707	3
807519	831	5
807532	2542	5
807529	2514	4
807477	1051	4
807524	3113	4
807505	1191	5
807483	2294	5
807473	3169	2
807500	1170	5

	userid	movieid	rating
807464	0	844	5
807471	0	901	5
807494	0	1116	5
807499	0	1165	5
807502	0	1182	5
807507	0	2982	4
807509	0	2389	4
807811	1	1010	4
807784	1	950	4
807756	1	2147	5