In this notebook we are going to expose how some of the recommender systems that are currently part of the recommending framework are built.
The first thing that has to be considered when building a recommender system within this framework, is that all recommender systems should extend the class BaseRecommender
, either directly, or by extending another class that is a subclass of BaseRecommender
. All of the recommender systems must implement the method predict_rating(user, item)
.
In [1]:
import sys
sys.path.append('/Users/fpena/UCC/Thesis/projects/yelp/source/python')
from recommenders.base_recommender import BaseRecommender
In [2]:
class WeightedSumRecommender(BaseRecommender):
def __init__(self, similarity_metric='cosine'):
super(WeightedSumRecommender, self).__init__('SingleCF', 'cosine')
self.similarity_metric = similarity_metric
def predict_rating(self, user_id, item_id):
other_users = list(self.user_ids)
if user_id not in other_users:
return None
other_users.remove(user_id)
weighted_sum = 0.
z_denominator = 0.
for other_user in other_users:
similarity = self.user_similarity_matrix[other_user][user_id]
if item_id in self.user_dictionary[other_user].item_ratings and similarity is not None:
other_user_item_rating =\
self.user_dictionary[other_user].item_ratings[item_id]
weighted_sum += similarity * other_user_item_rating
z_denominator += abs(similarity)
if z_denominator == 0:
return None
predicted_rating = weighted_sum / z_denominator
return predicted_rating
As it can be seen in the code above the WeightedSumRecommender
makes a call to the constructor of the BaseRecommender
class passing the name of the recommeder, and the metric that is going to be used to calculate the similarity between two users (in the above case is 'cosine'). Finally it implements the predic_rating(user,item)
method, in which it uses some of the attributes of the BaseRecommender
class that have been initialized after calling the load(reviews)
method. The attributes we can see the method is using are:
self.user_ids
self.user_similarity_matrix
self.user_dictionary
In [3]:
from recommenders.multicriteria.multicriteria_base_recommender import MultiCriteriaBaseRecommender
In [4]:
class DeltaCFRecommender(MultiCriteriaBaseRecommender):
def __init__(
self, similarity_metric='euclidean', significant_criteria_ranges=None):
super(DeltaCFRecommender, self).__init__(
'DeltaCFRecommender',
similarity_metric=similarity_metric,
significant_criteria_ranges=significant_criteria_ranges)
def predict_rating(self, user_id, item_id):
"""
Predicts the rating the user will give to the hotel
:param user_id: the ID of the user
:param item_id: the ID of the hotel
:return: a float between 1 and 5 with the predicted rating
"""
if user_id not in self.user_dictionary:
return None
cluster_name = self.user_dictionary[user_id].cluster
# We remove the given user from the cluster in order to avoid bias
cluster_users = list(self.user_cluster_dictionary[cluster_name])
cluster_users.remove(user_id)
similarities_sum = 0.
similarities_ratings_sum = 0.
num_users = 0
for cluster_user in cluster_users:
cluster_user_overall_rating = self.user_dictionary[cluster_user].average_overall_rating
users_similarity = self.user_similarity_matrix[cluster_user][user_id]
if item_id in self.user_dictionary[cluster_user].item_ratings and users_similarity is not None:
cluster_user_item_rating = self.user_dictionary[cluster_user].item_ratings[item_id]
similarities_sum += users_similarity
similarities_ratings_sum +=\
users_similarity * (cluster_user_item_rating - cluster_user_overall_rating)
num_users += 1
if num_users == 0:
return None
user_average_rating = self.user_dictionary[user_id].average_overall_rating
predicted_rating = \
user_average_rating + similarities_ratings_sum / similarities_sum
return predicted_rating
Addtionally to using the attributes from the BaseRecommender
class, the classes that extend the MultiCriteriaBaseRecommender
class also have access to the self.user_cluster_dictionary
. Notice that altough the same attributes from the BaseRecommender
class are accessed, this attributes have disctinct values in the MultiCriteriaBaseRecommender
class. For instance, the self.user_similarity_matrix
is calculated using multi-criteria ratings, and has totally different values from the self.user_similarity_matrix
that is obtained while extending directly from the BaseRecommender
class.
In [5]:
class OverallRecommender(MultiCriteriaBaseRecommender):
def __init__(self, significant_criteria_ranges=None):
super(OverallRecommender, self).__init__(
'OverallRecommender',
similarity_metric=None,
significant_criteria_ranges=significant_criteria_ranges)
def predict_rating(self, user_id, item_id):
if user_id not in self.user_dictionary:
return None
cluster_name = self.user_dictionary[user_id].cluster
# We remove the given user from the cluster in order to avoid bias
cluster_users = list(self.user_cluster_dictionary[cluster_name])
cluster_users.remove(user_id)
similarities_ratings_sum = 0.
num_users = 0
for cluster_user in cluster_users:
if item_id in self.user_dictionary[cluster_user].item_ratings:
cluster_user_item_rating = self.user_dictionary[cluster_user].item_ratings[item_id]
similarities_ratings_sum += cluster_user_item_rating
num_users += 1
if num_users == 0:
return None
predicted_rating = similarities_ratings_sum / num_users
return predicted_rating
As it can be seen, the value that this recommender systems have for the similarity metric is None
since it is not a collaborative filtering system.
The dummy recommender is just a recommender system that predict the same rating regarless of the user and/or item. Typically if a recommender system behaves worst than the dummy recommender system, its useless. In the constructor of this class, the rating that will always be predicted is given.
In [6]:
class DummyRecommender(BaseRecommender):
def __init__(self, rating):
super(DummyRecommender, self).__init__('DummyRecommender', None)
self._rating = rating
def predict_rating(self, user_id, hotel_id):
return self._rating
def load(self, reviews):
pass
In [7]:
reviews_matrix_5 = [
{'user_id': 'U1', 'offering_id': 1, 'overall_rating': 5.0, 'cleanliness_rating': 2.0, 'location_rating': 2.0, 'rooms_rating': 8.0, 'service_rating': 8.0, 'value_rating': 5.0},
{'user_id': 'U1', 'offering_id': 2, 'overall_rating': 7.0, 'cleanliness_rating': 5.0, 'location_rating': 5.0, 'rooms_rating': 9.0, 'service_rating': 9.0, 'value_rating': 7.0},
{'user_id': 'U1', 'offering_id': 3, 'overall_rating': 5.0, 'cleanliness_rating': 2.0, 'location_rating': 2.0, 'rooms_rating': 8.0, 'service_rating': 8.0, 'value_rating': 5.0},
{'user_id': 'U1', 'offering_id': 4, 'overall_rating': 7.0, 'cleanliness_rating': 5.0, 'location_rating': 5.0, 'rooms_rating': 9.0, 'service_rating': 9.0, 'value_rating': 7.0},
# {'user_id': 'U1', 'offering_id': 5, 'overall_rating': 4.0},
{'user_id': 'U2', 'offering_id': 1, 'overall_rating': 5.0, 'cleanliness_rating': 8.0, 'location_rating': 8.0, 'rooms_rating': 2.0, 'service_rating': 2.0, 'value_rating': 5.0},
{'user_id': 'U2', 'offering_id': 2, 'overall_rating': 7.0, 'cleanliness_rating': 9.0, 'location_rating': 9.0, 'rooms_rating': 5.0, 'service_rating': 5.0, 'value_rating': 7.0},
{'user_id': 'U2', 'offering_id': 3, 'overall_rating': 5.0, 'cleanliness_rating': 8.0, 'location_rating': 8.0, 'rooms_rating': 2.0, 'service_rating': 2.0, 'value_rating': 5.0},
{'user_id': 'U2', 'offering_id': 4, 'overall_rating': 7.0, 'cleanliness_rating': 9.0, 'location_rating': 9.0, 'rooms_rating': 5.0, 'service_rating': 5.0, 'value_rating': 7.0},
{'user_id': 'U2', 'offering_id': 5, 'overall_rating': 9.0, 'cleanliness_rating': 9.0, 'location_rating': 9.0, 'rooms_rating': 9.0, 'service_rating': 9.0, 'value_rating': 9.0},
{'user_id': 'U3', 'offering_id': 1, 'overall_rating': 5.0, 'cleanliness_rating': 8.0, 'location_rating': 8.0, 'rooms_rating': 2.0, 'service_rating': 2.0, 'value_rating': 5.0},
{'user_id': 'U3', 'offering_id': 2, 'overall_rating': 7.0, 'cleanliness_rating': 9.0, 'location_rating': 9.0, 'rooms_rating': 5.0, 'service_rating': 5.0, 'value_rating': 7.0},
{'user_id': 'U3', 'offering_id': 3, 'overall_rating': 5.0, 'cleanliness_rating': 8.0, 'location_rating': 8.0, 'rooms_rating': 2.0, 'service_rating': 2.0, 'value_rating': 5.0},
{'user_id': 'U3', 'offering_id': 4, 'overall_rating': 7.0, 'cleanliness_rating': 9.0, 'location_rating': 9.0, 'rooms_rating': 5.0, 'service_rating': 5.0, 'value_rating': 7.0},
{'user_id': 'U3', 'offering_id': 5, 'overall_rating': 9.0, 'cleanliness_rating': 9.0, 'location_rating': 9.0, 'rooms_rating': 9.0, 'service_rating': 9.0, 'value_rating': 9.0},
{'user_id': 'U4', 'offering_id': 1, 'overall_rating': 6.0, 'cleanliness_rating': 3.0, 'location_rating': 3.0, 'rooms_rating': 9.0, 'service_rating': 9.0, 'value_rating': 6.0},
{'user_id': 'U4', 'offering_id': 2, 'overall_rating': 6.0, 'cleanliness_rating': 3.0, 'location_rating': 3.0, 'rooms_rating': 9.0, 'service_rating': 9.0, 'value_rating': 6.0},
{'user_id': 'U4', 'offering_id': 3, 'overall_rating': 6.0, 'cleanliness_rating': 4.0, 'location_rating': 4.0, 'rooms_rating': 8.0, 'service_rating': 8.0, 'value_rating': 6.0},
{'user_id': 'U4', 'offering_id': 4, 'overall_rating': 6.0, 'cleanliness_rating': 4.0, 'location_rating': 4.0, 'rooms_rating': 8.0, 'service_rating': 8.0, 'value_rating': 6.0},
{'user_id': 'U4', 'offering_id': 5, 'overall_rating': 5.0, 'cleanliness_rating': 5.0, 'location_rating': 5.0, 'rooms_rating': 5.0, 'service_rating': 5.0, 'value_rating': 5.0},
{'user_id': 'U5', 'offering_id': 1, 'overall_rating': 6.0, 'cleanliness_rating': 3.0, 'location_rating': 3.0, 'rooms_rating': 9.0, 'service_rating': 9.0, 'value_rating': 6.0},
{'user_id': 'U5', 'offering_id': 2, 'overall_rating': 6.0, 'cleanliness_rating': 3.0, 'location_rating': 3.0, 'rooms_rating': 9.0, 'service_rating': 9.0, 'value_rating': 6.0},
{'user_id': 'U5', 'offering_id': 3, 'overall_rating': 6.0, 'cleanliness_rating': 4.0, 'location_rating': 4.0, 'rooms_rating': 8.0, 'service_rating': 8.0, 'value_rating': 6.0},
{'user_id': 'U5', 'offering_id': 4, 'overall_rating': 6.0, 'cleanliness_rating': 4.0, 'location_rating': 4.0, 'rooms_rating': 8.0, 'service_rating': 8.0, 'value_rating': 6.0},
{'user_id': 'U5', 'offering_id': 5, 'overall_rating': 5.0, 'cleanliness_rating': 5.0, 'location_rating': 5.0, 'rooms_rating': 5.0, 'service_rating': 5.0, 'value_rating': 5.0}
]
We are going to create a method that iterates over all the ratings and compares the predicted rating against the acual rating
In [8]:
from evaluation.mean_absolute_error import MeanAbsoluteError
from evaluation.root_mean_square_error import RootMeanSquareError
In [9]:
def predict_rating_list(predictor, reviews):
"""
For each one of the reviews this method predicts the rating for the
user and item contained in the review and also returns the error
between the predicted rating and the actual rating the user gave to the
item
:param predictor: the object used to predict the rating that will be given
by a the user to the item contained in each review
:param reviews: a list of reviews (the test data)
:return: a tuple with a list of the predicted ratings and the list of
errors for those predictions
"""
predicted_ratings = []
errors = []
for review in reviews:
user_id = review['user_id']
item_id = review['offering_id']
predicted_rating = predictor.predict_rating(user_id, item_id)
actual_rating = review['overall_rating']
# print(user_id, item_id, predicted_rating)
error = None
if predicted_rating is not None and actual_rating is not None:
error = abs(predicted_rating - actual_rating)
predicted_ratings.append(predicted_rating)
errors.append(error)
return predicted_ratings, errors
Then we can compare how the systems behave in terms of Mean Absolute Error and Root Mean Square Error. Additionally, we will include the other recommender systems that have been implemented in the framework but that have not been shown in this notebook.
In [10]:
from recommenders.adjusted_weighted_sum_recommender import AdjustedWeightedSumRecommender
from recommenders.multicriteria.delta_recommender import DeltaRecommender
from recommenders.multicriteria.overall_cf_recommender import OverallCFRecommender
In [13]:
def test_compare_against_dummy_recommender():
recommender = AdjustedWeightedSumRecommender()
recommender.load(reviews_matrix_5)
_, errors = predict_rating_list(recommender, reviews_matrix_5)
awsr_mean_absolute_error = MeanAbsoluteError.compute_list(errors)
awsr_root_mean_square_error = RootMeanSquareError.compute_list(errors)
print('\nAdjusteddddddd Weighted Sum Recommender')
print('Mean Absolute error:', awsr_mean_absolute_error)
print('Root mean square error:', awsr_root_mean_square_error)
recommender = WeightedSumRecommender()
recommender.load(reviews_matrix_5)
_, errors = predict_rating_list(recommender, reviews_matrix_5)
wsr_mean_absolute_error = MeanAbsoluteError.compute_list(errors)
wsr_root_mean_square_error = RootMeanSquareError.compute_list(errors)
print('\nWeighted Sum Recommender')
print('Mean Absolute error:', wsr_mean_absolute_error)
print('Root mean square error:', wsr_root_mean_square_error)
recommender = DeltaCFRecommender()
recommender.load(reviews_matrix_5)
_, errors = predict_rating_list(recommender, reviews_matrix_5)
dcfr_mean_absolute_error = MeanAbsoluteError.compute_list(errors)
dcfr_root_mean_square_error = RootMeanSquareError.compute_list(errors)
print('\nDelta CF Recommender')
print('Mean Absolute error:', dcfr_mean_absolute_error)
print('Root mean square error:', dcfr_root_mean_square_error)
recommender = DeltaRecommender()
recommender.load(reviews_matrix_5)
_, errors = predict_rating_list(recommender, reviews_matrix_5)
dr_mean_absolute_error = MeanAbsoluteError.compute_list(errors)
dr_root_mean_square_error = RootMeanSquareError.compute_list(errors)
print('\nDelta Recommender')
print('Mean Absolute error:', dr_mean_absolute_error)
print('Root mean square error:', dr_root_mean_square_error)
recommender = OverallCFRecommender()
recommender.load(reviews_matrix_5)
_, errors = predict_rating_list(recommender, reviews_matrix_5)
ocfr_mean_absolute_error = MeanAbsoluteError.compute_list(errors)
ocfr_root_mean_square_error = RootMeanSquareError.compute_list(errors)
print('\nOverall CF Recommender')
print('Mean Absolute error:', ocfr_mean_absolute_error)
print('Root mean square error:', ocfr_root_mean_square_error)
recommender = OverallRecommender()
recommender.load(reviews_matrix_5)
_, errors = predict_rating_list(recommender, reviews_matrix_5)
or_mean_absolute_error = MeanAbsoluteError.compute_list(errors)
or_root_mean_square_error = RootMeanSquareError.compute_list(errors)
print('\nOverall Recommender')
print('Mean Absolute error:', or_mean_absolute_error)
print('Root mean square error:', or_root_mean_square_error)
recommender = DummyRecommender(6.0)
_, errors = predict_rating_list(recommender, reviews_matrix_5)
dummy_mean_absolute_error = MeanAbsoluteError.compute_list(errors)
dummy_root_mean_square_error = RootMeanSquareError.compute_list(errors)
print('\nDummy Recommender')
print('Mean Absolute error:', dummy_mean_absolute_error)
print('Root mean square error:', dummy_root_mean_square_error)
In [14]:
test_compare_against_dummy_recommender()
As it can be seen, the results show that for this small dataset, the dummy recommender system outperforms four other recommender system and its only beatean by two, both which are multi-criteria collaborative filtering recommenders. Just to note, the Adjusted Weighted Sum Recommender and the Weighted Sum Recommender are single-criterion collaborative filtering recommenders.