In [10]:

    
from IPython.display import Math

Testing the Recommender Systems on the MovieLens 100k Dataset

Due to the unsucesful results when testing the recommender systems on the FourCity TripAdvisor dataset, it was decided to test the recommender systems on a single-criterion dataset which was known in the literature so we could compare the results of our recommender against other proposals that use the same data.

Comparison with FourCity dataset

Before showing the results of the executions of the recommender systems, we are going a quick comparison of the two datasets

FourCity (TripAdvisor)

Number of reviews: 11327
Number of users: 888
Number of items: 1482
Sparsity: 0.9917166660587713 --> Take a look why this number is slightly different from 1 - (11327 / (888 * 1482))

MovieLens

Number of reviews: 100000
Number of users: 943
Number of items: 1682
Sparsity: 0.9369533063577546

Types of recommenders

DummyRecommender

This recommender just predicts the closest integer to the average of the ratings of all samples (that is, the union of the train and test data).

The performance of this recommender is

Mean absolute error: 0.89416
Root mean square error: 1.219839
Time: 0.14665103 seconds

AverageRecommender

This recommeder predicts the average of the ratings of the training data.

The average performance of this recommender is

Mean absolute error: 0.944726
Root mean square error: 1.125578
Time: 18 seconds

WeigthedSumRecommender

This recommender uses the following formula to calculate the ratings:



In [11]:

    
Math(r'R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)')









    Out[11]:





$$R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)$$



In [12]:

    
Math(r'z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}')









    Out[12]:





$$z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}$$

The average performance of this recommender is

Mean absolute error: 0.854038778
Root mean square error: 1.088897314
Top N: 0.694614548
Best MAE value: 0.797591782 (SingleSimilarity, chebyshev, 40 neighbours)
Best RMSE value: 1.003798434 (SingleSimilarity, chebyshev, 40 neighbours)
Best Top N value: 0.728656574 (SingleSimilariy, pearson, all neighbours)

AdjustedWeigthedSumRecommender

This recommender uses the following formula to calculate the ratings:



In [13]:

    
Math(r'R(u,i) = \overline{R(u)} +  z \sum_{u` \in N(u)} sim(u,u`) \cdot (R(u` ,i) - \overline{R(u`))}')









    Out[13]:





$$R(u,i) = \overline{R(u)} +  z \sum_{u` \in N(u)} sim(u,u`) \cdot (R(u` ,i) - \overline{R(u`))}$$

The average performance of this recommender is

Mean absolute error: 0.823195759
Root mean square error: 1.041963991
Top N: 0.686021491
Best MAE value: 0.746653177 (SingleSimilarity, pearson, all neighbours)
Best RMSE value: 0.951336098 (SingleSimilarity, pearson, all neighbours)
Best Top N value: 0.725733055 (SingleSimilarity, pearson, all neighbours)

Summary

The best MAE is 0.746653177 by the AdjustedWeigthedSumRecommender (SingleSimilarity, pearson, all neighbours).
The best RMSE is 0.951336098 by the AdjustedWeigthedSumRecommender (SingleSimilarity, pearson, all neighbours).
The best Top N is 0.728656574 by the WeigthedSumRecommender (SingleSimilariy, pearson, all neighbours).
The best MAE average is 0.823195759 by the AdjustedWeigthedSumRecommender.
The best RMSE average is 1.041963991 by the AdjustedWeigthedSumRecommender.
The best Top N average is 0.686021491 by the AdjustedWeigthedSumRecommender.
The higher the number of neighbours, the better the results for all evaluation metrics.
Both the AdjustedWeigthedSumRecommender and the WeigthedSumRecommender manage to beat the DummyRecommender and the AverageRecommender.