Polara supports various evaluation regimes and can be tuned flexibly to achieve the setup you need.
For example, matrix factorization models are not directly applicable in the warm-start scenario (when test users are not present in the training set) until the folding-in technique is implemented for generating recommendations. Keep this in mind when creating your custom solutions.
The models can be easily tested in both scenarios without any modification due to automatic handling of the warm-start case, provided by Polara.
In [1]:
from polara.recommender.data import RecommenderData
from polara.datasets.movielens import get_movielens_data
In [2]:
data = get_movielens_data() # will automatically download it, or you can specify a path to the local copy
data.head()
Out[2]:
In [3]:
data.shape
Out[3]:
In [4]:
data_model = RecommenderData(data, 'userid', 'movieid', 'rating', seed=0)
data_model.get_configuration()
Out[4]:
In [5]:
data_model.random_holdout = True # allow not only top-rated items in evaluation, this reduces evaluation biases
data_model.warm_start = False # standard case
data_model.prepare()
Let's check for demonstration purposes that all test users are present in the training set:
In [6]:
data_model.test.holdout['userid'].isin(data_model.index.userid.training.new).all()
Out[6]:
In [7]:
from polara.recommender.models import SVDModel
In [8]:
svd = SVDModel(data_model) # create model
svd.switch_positive = 4 # mark ratings below 4 as negative feedback and treat them accordingly in evaluation
svd.build() # fit model
svd.evaluate() # by default it calculates the total number of hits
Out[8]:
In [9]:
# implicit library must be installed separately, follow instructions at https://github.com/benfred/implicit
from polara.recommender.external.implicit.ialswrapper import ImplicitALS
In [10]:
als = ImplicitALS(data_model) # create model
als.switch_positive = 4 # same as for PureSVD, affects only evaluation
als.build()
als.evaluate()
Out[10]:
In [ ]:
The maximum possible number of correct recomendations is:
In [11]:
data_model.test.holdout.query('rating>=4').shape[0]
Out[11]:
Both models correctly retrieve around a quarter of all items. Let's look on the averaged relevance scores:
In [12]:
svd.evaluate('relevance')
Out[12]:
In [13]:
als.evaluate('relevance')
Out[13]:
This will split test users from the training data.
In [14]:
data_model.warm_start = True # warm-start case
data_model.prepare()
There's no intersection between test and training users:
In [15]:
data_model.index.userid.test.old.isin(data_model.index.userid.training.old).any()
Out[15]:
For example, as can be seen from the log message above, it filters out the items that are happen to be in the test split but are not a part of the training set.
In [16]:
svd.build()
svd.evaluate()
Out[16]:
Note, that you do not have to recreate the models as they operate on top of the data_model
instance.
In fact, the state change in data_model
is synchronized with the dependent models' states. It will force models to rebuild themselves even if you do not explicitly specify it (even though it is recommended to be explicit to conform with the Zen of Python).
In [17]:
als.evaluate()
Out[17]:
The maximum possible number of correct recomendations is:
In [18]:
data_model.test.holdout.query('rating>=4').shape[0]
Out[18]:
Check relevance scores:
In [19]:
svd.evaluate('relevance')
Out[19]:
In [20]:
als.evaluate('relevance')
Out[20]:
In these experiments we used the default settings for both models and, therefore, the results may not be neccessarily optimal. Also note, that the output of SVD is deterministic, while iALS tends to provide varying results, spread around some average value.
It can be easily done within Polara as well and will be covered in a separate guide.
In [ ]: