In [ ]:
from __future__ import division
import pandas as pd
import numpy as np

from polara.tools.movielens import get_movielens_data
from polara.recommender.data import RecommenderData
from polara.recommender.models import NonPersonalized, SVDModel
from collections import namedtuple

Preparing data

get test users data

In [ ]:
test_data = pd.read_csv("https://github.com/Evfro/RecSys_ISP2017/raw/master/test_data_new.gz", compression='gzip')
test_data.head()

This data is not a part of the Movielens-1M dataset, however, it contains ratings for the same movies. You are expected to use this dataset to generate recommendations with your recommendation model.

get movielens data

Movielens-1M dataset to train your model.


In [ ]:
ml_data = get_movielens_data()

As previously, you need to convert it into appropriate format:


In [ ]:
data_model = RecommenderData(ml_data, 'userid', 'movieid', 'rating')

Important:

As you'll use custom test data, the extra step should be taken in order to prepare your data model. You only have to do it once!


In [ ]:
data_model._training = data_model._data #set training data to full movielens dataset
data_model._test = test_data.copy() # setting custom test data

You also have to remove gaps in user and movie indices. In standard usage scenarios this is done by the method _prepare() (and even doesn't require your intervention). However, this is not applicable in the custom setup (you want to prevent some of the actions performed by the method).

Anyway, this step is still relatively easy with _reindex_data() method. It will not only build new index, but also will save index mapping in a special attribute index.itemid:


In [ ]:
data_model._reindex_data() # build new index of users and movies with no gaps and stores it in index.itemid attribute 
data_model._test['movieid'] = data_model._test['movieid'].map(data_model.index.itemid.set_index('old').new)

The last step is to "emulate" the splitting of the test data into the observed part and holdout:


In [ ]:
data_model._test = namedtuple('TestData', 'testset evalset')._make([data_model._test, None])

Building your model


In [ ]:
svd = SVDModel(data_model)

In [ ]:
svd.build()

In [ ]:
recs = svd.get_recommendations()
recs.shape

Submitting your solution

Before submitting you have to "reverse" movies index back to original values. It can be done in one line:


In [ ]:
recs = pd.Series(recs.ravel()).map(data_model.index.itemid.set_index('new').old).values.reshape(recs.shape)

Save you model and submit results. Note, that both upload address and the leaderbord itself have new location:


In [ ]:
np.savez('your-team-name', recs=recs)

In [ ]:
import requests

files = {'upload': open('your-team-name.npz','rb')}
url = "http://isp2017.azurewebsites.net/team/upload"

r = requests.post(url, files=files)

In [ ]:
print r.status_code, r.reason
Viewing results:

Note on evaluation:

All the ratings below 4 are considered a negtaive feedback and are taken into account accordingly. You can easily spot it by looking at non-zero value of nDCL metric.

When testing you recommendation model set switch_positive attribute before evaluating your results (its default value is 0):

svd.switch_positive = 4

In [ ]: