In [10]:
from IPython.display import Math

Testing the Recommender Systems on the MovieLens 100k Dataset

Due to the unsucesful results when testing the recommender systems on the FourCity TripAdvisor dataset, it was decided to test the recommender systems on a single-criterion dataset which was known in the literature so we could compare the results of our recommender against other proposals that use the same data.

Comparison with FourCity dataset

Before showing the results of the executions of the recommender systems, we are going a quick comparison of the two datasets

FourCity (TripAdvisor)

  • Number of reviews: 11327
  • Number of users: 888
  • Number of items: 1482
  • Sparsity: 0.9917166660587713 --> Take a look why this number is slightly different from 1 - (11327 / (888 * 1482))

MovieLens

  • Number of reviews: 100000
  • Number of users: 943
  • Number of items: 1682
  • Sparsity: 0.9369533063577546

Types of recommenders

DummyRecommender

This recommender just predicts the closest integer to the average of the ratings of all samples (that is, the union of the train and test data).

The performance of this recommender is

  • Mean absolute error: 0.89416
  • Root mean square error: 1.219839
  • Time: 0.14665103 seconds

AverageRecommender

This recommeder predicts the average of the ratings of the training data.

The average performance of this recommender is

  • Mean absolute error: 0.944726
  • Root mean square error: 1.125578
  • Time: 18 seconds

WeigthedSumRecommender

This recommender uses the following formula to calculate the ratings:


In [11]:
Math(r'R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)')


Out[11]:
$$R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)$$

In [12]:
Math(r'z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}')


Out[12]:
$$z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}$$

The average performance of this recommender is

  • Mean absolute error: 0.854038778
  • Root mean square error: 1.088897314
  • Top N: 0.694614548
  • Best MAE value: 0.797591782 (SingleSimilarity, chebyshev, 40 neighbours)
  • Best RMSE value: 1.003798434 (SingleSimilarity, chebyshev, 40 neighbours)
  • Best Top N value: 0.728656574 (SingleSimilariy, pearson, all neighbours)

AdjustedWeigthedSumRecommender

This recommender uses the following formula to calculate the ratings:


In [13]:
Math(r'R(u,i) = \overline{R(u)} +  z \sum_{u` \in N(u)} sim(u,u`) \cdot (R(u` ,i) - \overline{R(u`))}')


Out[13]:
$$R(u,i) = \overline{R(u)} + z \sum_{u` \in N(u)} sim(u,u`) \cdot (R(u` ,i) - \overline{R(u`))}$$

The average performance of this recommender is

  • Mean absolute error: 0.823195759
  • Root mean square error: 1.041963991
  • Top N: 0.686021491
  • Best MAE value: 0.746653177 (SingleSimilarity, pearson, all neighbours)
  • Best RMSE value: 0.951336098 (SingleSimilarity, pearson, all neighbours)
  • Best Top N value: 0.725733055 (SingleSimilarity, pearson, all neighbours)

Summary

  • The best MAE is 0.746653177 by the AdjustedWeigthedSumRecommender (SingleSimilarity, pearson, all neighbours).
  • The best RMSE is 0.951336098 by the AdjustedWeigthedSumRecommender (SingleSimilarity, pearson, all neighbours).
  • The best Top N is 0.728656574 by the WeigthedSumRecommender (SingleSimilariy, pearson, all neighbours).
  • The best MAE average is 0.823195759 by the AdjustedWeigthedSumRecommender.
  • The best RMSE average is 1.041963991 by the AdjustedWeigthedSumRecommender.
  • The best Top N average is 0.686021491 by the AdjustedWeigthedSumRecommender.
  • The higher the number of neighbours, the better the results for all evaluation metrics.
  • Both the AdjustedWeigthedSumRecommender and the WeigthedSumRecommender manage to beat the DummyRecommender and the AverageRecommender.