In this notebook we are going to debug two recommender systems in order to find out how they are behaving. To achieve this, we do a step by step execution of the algorithms printing the values each variable has at each step. These are the steps we are going to follow:
To accomplish this, we load and shuffle the reviews just once and then store the data in a file. Then we just load the file that contains the reviews already shuffled.
In [43]:
import sys
sys.path.append('/Users/fpena/UCC/Thesis/projects/yelp/source/python')
from IPython.display import Math
from etl import ETLUtils
file_path = '/Users/fpena/tmp/filtered_reviews_multi_non_sparse_shuffled.json'
reviews = ETLUtils.load_json_file(file_path)
print(reviews[0])
print(reviews[1])
print(reviews[2])
In [44]:
train, test = ETLUtils.split_train_test(reviews, split=0.8, start=0.2, shuffle_data=False)
print('Training')
print(train[0])
print(train[1])
print(train[2])
print('\nTest')
print(test[0])
print(test[1])
print(test[2])
In the above code, we are telling the split_train_test method that we want to split the data into a 80-20 way, the parameter split=0.8 indicates that the size of the training data should be of 80%. We are also telling the method that we want the training data to start after the 20% of the list. This means that the training data will go from 20% of the list to 100%, and the test data is the remaining, i.e. from 0% to 20%. A graphical way to express this is [OXXXX], where O is the test data and X is the training data.
In the output we can see that the first three records of the test set are the same as the first three records of the whole reviews set, which is something we expected.
After playing around with different sets, we have manually chose three reviews that can illustrate very well how is the recommender system working. We have chose these reviews because the users in them have more than one neighbour, and in some cases the neighbours share more than one item in common.
The three selected reviews are in the positions 1, 2 and 8 of the test data set.
In [45]:
print(test[1])
print(test[2])
print(test[8])
In order to predict the rating, the first thing we are going to do is create an instance of the recommender system and then load the training data into the recommender. We will also set the maximum number of nighbours as 5. We can achive that with the following code:
In [46]:
from recommenders.weighted_sum_recommender import WeightedSumRecommender
from recommenders.similarity.single_similarity_matrix_builder import SingleSimilarityMatrixBuilder
from recommenders.adjusted_weighted_sum_recommender import AdjustedWeightedSumRecommender
from recommenders.average_recommender import AverageRecommender
from recommenders.dummy_recommender import DummyRecommender
recommender = WeightedSumRecommender(SingleSimilarityMatrixBuilder('euclidean'))
recommender._num_neighbors = 5
recommender.load(train)
We will also create several recommenders to compare their performance
In [47]:
ws_recommender = WeightedSumRecommender(SingleSimilarityMatrixBuilder('euclidean'))
ws_recommender._num_neighbors = 5
ws_recommender.load(train)
aws_recommender = AdjustedWeightedSumRecommender(SingleSimilarityMatrixBuilder('euclidean'))
aws_recommender._num_neighbors = 5
aws_recommender.load(train)
avg_recommender = AverageRecommender()
avg_recommender.load(train)
dummy_recommender = DummyRecommender(4.0)
In [48]:
my_review = test[1]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']
print('User ID:', user_id)
print('Item ID:', item_id)
print('Rating:', rating)
In [49]:
# If the user is not in the training set, then we can't predict the rating
if user_id not in recommender.user_ids:
predicted_rating = None
# We obtain the neighbours of the user
# The neighbourhood is the set of users that have rated the same item as MyUser in the training set
# and that also have rated MyItem
neighbourhood = recommender.get_neighbourhood(user_id, item_id)
For this particular user, the neighbourhood size is:
In [50]:
print('Neighbourhood size:', len(neighbourhood))
And the neighbours are:
In [51]:
for neighbour in neighbourhood:
print(neighbour)
In [52]:
for neighbour in neighbourhood:
print('User ID:', neighbour,
'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
'Rating:', recommender.user_dictionary[neighbour].item_ratings[item_id])
In [53]:
from tripadvisor.fourcity import extractor
for neighbour in neighbourhood:
print('User ID:', neighbour,
'Common items:', extractor.get_common_items(recommender.user_dictionary, user_id, neighbour))
For each neighbour we are going to calculate the similarity by comparing the rating they have given to the item, with MyUser's rating, and then manually calculate the euclidean similarity by using the formula similarity = 1 / (1 + euclidean_distance)
In [54]:
neighbour_id = '113CB53AD6CF0EEC7427F804A7D0607E'
neughbour_item = 611947
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
neighbour_id = 'DCCD38C886A7C72C215AFCCB625AC03C'
neughbour_item = 93352
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
neighbour_id = '6562BBD4EA770FE84E579622F68FA181'
neughbour_item = 611947
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
As we can see in the above steps, the similarity values are correct.
In [55]:
for neighbour in neighbourhood:
print('User ID:', neighbour,
'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
'Item rating:', recommender.user_dictionary[neighbour].item_ratings[item_id],
'Average rating:', recommender.user_dictionary[neighbour].average_overall_rating)
With this information, we can now calculate the predicted rating using the weighted sum recommender formula:
In [56]:
Math(r'R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)')
Out[56]:
and the adjusted weighted sum recommender formula:
In [57]:
Math(r'R(u,i) = \overline{R(u)} + z \sum_{u` \in N(u)} sim(u,u`) \cdot (R(u` ,i) - \overline{R(u`)})')
Out[57]:
where
In [58]:
Math(r'z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}')
Out[58]:
So the predicted rating that MyUser would give to MyItem is:
In [59]:
my_review = test[1]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']
# Weighted sum recommender
denominator = 0.5 + 0.333 + 0.333
ws_numerator = 0.5 * 3.0 + 0.333 * 4.0 + 0.333 * 3.0
ws_predicted_rating = ws_numerator / denominator
print('Manual weighted sum predicted Rating:', ws_predicted_rating)
print('Weighted sum recommender system predicted rating:', ws_recommender.predict_rating(user_id, item_id))
# Adjusted weighted sum recommender
user_average_rating = recommender.user_dictionary[user_id].average_overall_rating
aws_numerator = 0.5 * (3.0 - 4.286) + 0.333 * (4.0 - 3.429) + 0.333 * (3.0 - 3.786)
aws_predicted_rating = user_average_rating + aws_numerator / denominator
print('Manual adjusted weighted sum predicted Rating:', aws_predicted_rating)
print('Adjusted weighted sum recommender system predicted rating:', aws_recommender.predict_rating(user_id, item_id))
print('Average recommender rating:', avg_recommender.predict_rating(user_id, item_id))
print('Dummy recommender rating:', dummy_recommender.predict_rating(user_id, item_id))
actual_rating = my_review['overall_rating']
print('Actual rating:', actual_rating)
ws_error = abs(actual_rating - ws_predicted_rating)
aws_error = abs(actual_rating - aws_predicted_rating)
avg_error = abs(actual_rating - avg_recommender.predict_rating(user_id, item_id))
dummy_error = abs(actual_rating - dummy_recommender.predict_rating(user_id, item_id))
print('Errors:', 'WS:', ws_error, 'AWS:', aws_error, 'AVG:', avg_error, 'Dummy:', dummy_error)
In [60]:
my_review = test[2]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']
print('User ID:', user_id)
print('Item ID:', item_id)
print('Rating:', rating)
In [61]:
# If the user is not in the training set, then we can't predict the rating
if user_id not in recommender.user_ids:
predicted_rating = None
# We obtain the neighbours of the user
# The neighbourhood is the set of users that have rated the same item as MyUser in the training set
# and that also have rated MyItem
neighbourhood = recommender.get_neighbourhood(user_id, item_id)
For this particular user, the neighbourhood size is:
In [62]:
print('Neighbourhood size:', len(neighbourhood))
And the neighbours are:
In [63]:
for neighbour in neighbourhood:
print(neighbour)
In [64]:
for neighbour in neighbourhood:
print('User ID:', neighbour,
'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
'Rating:', recommender.user_dictionary[neighbour].item_ratings[item_id])
Now we are going to verify that the similarity values are correct. To do so, we are going to see the items that the neighbours have in common with MyUser. We will do this just for the first 5 items, because we are only considering the
In [65]:
from tripadvisor.fourcity import extractor
for neighbour in neighbourhood[:5]:
print('User ID:', neighbour,
'Common items:', extractor.get_common_items(recommender.user_dictionary, user_id, neighbour))
For each neighbour we are going to calculate the similarity by comparing the rating they have given to the item, with MyUser's rating, and then manually calculate the euclidean similarity by using the formula similarity = 1 / (1 + euclidean_distance)
In [66]:
neighbour_id = '1609426679198144A65A85C951CEFEF3'
neughbour_item = 102466
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
neighbour_id = '997502F65A2DC9C0664EF9990ABDC125'
neughbour_item = 102466
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
neighbour_id = '1CE94B8ABEB460F3719B9CE9EE1E476C'
neighbour_item1 = 96688
neighbour_item2 = 102466
my_user_rating1 = recommender.user_dictionary[user_id].item_ratings[neighbour_item1]
my_user_rating2 = recommender.user_dictionary[user_id].item_ratings[neighbour_item2]
neighbour_rating1 = recommender.user_dictionary[neighbour_id].item_ratings[neighbour_item1]
neighbour_rating2 = recommender.user_dictionary[neighbour_id].item_ratings[neighbour_item2]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating1 - neighbour_rating1)**2 + (my_user_rating2 - neighbour_rating2)**2)**0.5)))
neighbour_id = '43015504348C59179CDA040A674CFA7E'
neughbour_item = 99352
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
neighbour_id = '249063E5108AA2DA3728ACE7D7FCB0AE'
neughbour_item = 81295
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
As we can see in the above steps, the similarity values are correct.
In [67]:
for neighbour in neighbourhood:
print('User ID:', neighbour,
'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
'Item rating:', recommender.user_dictionary[neighbour].item_ratings[item_id],
'Average rating:', recommender.user_dictionary[neighbour].average_overall_rating)
With this information, we can now calculate the predicted rating using the formula
In [68]:
Math(r'R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)')
Out[68]:
In [69]:
Math(r'z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}')
Out[69]:
So the predicted rating that MyUser would give to MyItem is:
In [70]:
my_review = test[2]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']
# Weighted sum recommender
denominator = 1.0 + 1.0 + 1.0 + 1.0 + 0.5
ws_numerator = 1.0 * 4.0 + 1.0 * 3.0 + 1.0 * 2.0 + 1.0 * 4.0 + 0.5 * 5.0
ws_predicted_rating = ws_numerator / denominator
print('Manual weighted sum predicted Rating:', ws_predicted_rating)
print('Weighted sum recommender system predicted rating:', ws_recommender.predict_rating(user_id, item_id))
# Adjusted weighted sum recommender
user_average_rating = recommender.user_dictionary[user_id].average_overall_rating
aws_numerator = 1.0 * (4.0 - 4.0) + 1.0 * (3.0 - 3.875) + 1.0 * (2.0 - 3.333) + 1.0 * (4.0 - 4.333) + 0.5 * (5.0 - 3.66)
aws_predicted_rating = user_average_rating + aws_numerator / denominator
print('Manual adjusted weighted sum predicted Rating:', aws_predicted_rating)
print('Adjusted weighted sum recommender system predicted rating:', aws_recommender.predict_rating(user_id, item_id))
print('Average recommender rating:', avg_recommender.predict_rating(user_id, item_id))
print('Dummy recommender rating:', dummy_recommender.predict_rating(user_id, item_id))
actual_rating = my_review['overall_rating']
print('Actual rating:', actual_rating)
ws_error = abs(actual_rating - ws_predicted_rating)
aws_error = abs(actual_rating - aws_predicted_rating)
avg_error = abs(actual_rating - avg_recommender.predict_rating(user_id, item_id))
dummy_error = abs(actual_rating - dummy_recommender.predict_rating(user_id, item_id))
print('Errors:', 'WS:', ws_error, 'AWS:', aws_error, 'AVG:', avg_error, 'Dummy:', dummy_error)
In [71]:
my_review = test[8]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']
print('User ID:', user_id)
print('Item ID:', item_id)
print('Rating:', rating)
In [72]:
# If the user is not in the training set, then we can't predict the rating
if user_id not in recommender.user_ids:
predicted_rating = None
# We obtain the neighbours of the user
# The neighbourhood is the set of users that have rated the same item as MyUser in the training set
# and that also have rated MyItem
neighbourhood = recommender.get_neighbourhood(user_id, item_id)
For this particular user, the neighbourhood size is:
In [73]:
print('Neighbourhood size:', len(neighbourhood))
And the neighbours are:
In [74]:
for neighbour in neighbourhood:
print(neighbour)
In [75]:
for neighbour in neighbourhood:
print('User ID:', neighbour,
'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
'Rating:', recommender.user_dictionary[neighbour].item_ratings[item_id])
In [76]:
from tripadvisor.fourcity import extractor
for neighbour in neighbourhood:
print('User ID:', neighbour,
'Common items:', extractor.get_common_items(recommender.user_dictionary, user_id, neighbour))
For each neighbour we are going to calculate the similarity by comparing the rating they have given to the item, with MyUser's rating, and then manually calculate the euclidean similarity by using the formula similarity = 1 / (1 + euclidean_distance)
In [77]:
neighbour_id = '75D2B2509A733463F97A52877D078D86'
neughbour_item = 84093
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
neighbour_id = '1BB22F3C5F3C52E120EAF07B4D924605'
neughbour_item = 84093
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
neighbour_id = '48D750137FF4D27ADA7CFE294D2315FB'
neighbour_item1 = 84093
neighbour_item2 = 84087
my_user_rating1 = recommender.user_dictionary[user_id].item_ratings[neighbour_item1]
my_user_rating2 = recommender.user_dictionary[user_id].item_ratings[neighbour_item2]
neighbour_rating1 = recommender.user_dictionary[neighbour_id].item_ratings[neighbour_item1]
neighbour_rating2 = recommender.user_dictionary[neighbour_id].item_ratings[neighbour_item2]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating1 - neighbour_rating1)**2 + (my_user_rating2 - neighbour_rating2)**2)**0.5)))
As we can see in the above steps, the similarity values are correct.
In [78]:
for neighbour in neighbourhood:
print('User ID:', neighbour,
'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
'Item rating:', recommender.user_dictionary[neighbour].item_ratings[item_id],
'Average rating:', recommender.user_dictionary[neighbour].average_overall_rating)
With this information, we can now calculate the predicted rating using the formula
In [79]:
Math(r'R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)')
Out[79]:
In [80]:
Math(r'z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}')
Out[80]:
So the predicted rating that MyUser would give to MyItem is:
In [81]:
my_review = test[8]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']
# Weighted sum recommender
denominator = 0.5 + 0.333 + 0.309
ws_numerator = 0.5 * 4.0 + 0.333 * 4.0 + 0.309 * 5.0
ws_predicted_rating = ws_numerator / denominator
print('Manual weighted sum predicted Rating:', ws_predicted_rating)
print('Weighted sum recommender system predicted rating:', ws_recommender.predict_rating(user_id, item_id))
# Adjusted weighted sum recommender
user_average_rating = recommender.user_dictionary[user_id].average_overall_rating
aws_numerator = 0.5 * (4.0 - 4.0) + 0.333 * (4.0 - 4.333) + 0.309 * (5.0 - 5.0)
aws_predicted_rating = user_average_rating + aws_numerator / denominator
print('Manual adjusted weighted sum predicted Rating:', aws_predicted_rating)
print('Adjusted weighted sum recommender system predicted rating:', aws_recommender.predict_rating(user_id, item_id))
print('Average recommender rating:', avg_recommender.predict_rating(user_id, item_id))
print('Dummy recommender rating:', dummy_recommender.predict_rating(user_id, item_id))
actual_rating = my_review['overall_rating']
print('Actual rating:', actual_rating)
ws_error = abs(actual_rating - ws_predicted_rating)
aws_error = abs(actual_rating - aws_predicted_rating)
avg_error = abs(actual_rating - avg_recommender.predict_rating(user_id, item_id))
dummy_error = abs(actual_rating - dummy_recommender.predict_rating(user_id, item_id))
print('Errors:', 'WS:', ws_error, 'AWS:', aws_error, 'AVG:', avg_error, 'Dummy:', dummy_error)
In [84]:
my_review = test[67]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']
print('User ID:', user_id)
print('Item ID:', item_id)
print('Rating:', rating)
In [85]:
# If the user is not in the training set, then we can't predict the rating
if user_id not in recommender.user_ids:
predicted_rating = None
# We obtain the neighbours of the user
# The neighbourhood is the set of users that have rated the same item as MyUser in the training set
# and that also have rated MyItem
neighbourhood = recommender.get_neighbourhood(user_id, item_id)
For this particular user, the neighbourhood size is:
In [86]:
print('Neighbourhood size:', len(neighbourhood))
And the neighbours are:
In [87]:
for neighbour in neighbourhood:
print(neighbour)
In [90]:
for neighbour in neighbourhood:
print('User ID:', neighbour,
'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
'Rating:', recommender.user_dictionary[neighbour].item_ratings[item_id])
In [89]:
from tripadvisor.fourcity import extractor
for neighbour in neighbourhood:
print('User ID:', neighbour,
'Common items:', extractor.get_common_items(recommender.user_dictionary, user_id, neighbour))
For each neighbour we are going to calculate the similarity by comparing the rating they have given to the item, with MyUser's rating, and then manually calculate the euclidean similarity by using the formula similarity = 1 / (1 + euclidean_distance)
In [91]:
neighbour_id = '8AE0A5F56DE525C2BCDF87D7E54E9DF0'
neughbour_item = 84093
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
neighbour_id = '6C64DCB228E0255547A69C46B138B280'
neughbour_item = 223023
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
neighbour_id = '8DCF9BE7F5EAB3DE26B896DB3702D796'
neughbour_item = 223023
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
neighbour_id = 'C5907018EBC2EC24FA4C8B9CCD2C8CBD'
neughbour_item = 223023
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
neighbour_id = 'A9DBCFE3E77DB2DC0F989BE387A5B87A'
neughbour_item = 84093
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
neighbour_id = '15260F7D29FDCE47CBFA7BACD55C5326'
neughbour_item = 223023
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
neighbour_id = 'F6CF03D6B759EA5C2394037470DFBDDB'
neughbour_item = 84093
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
'MyUser rating:', my_user_rating,
'Neighbour rating:', neighbour_rating,
'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))
As we can see in the above steps, the similarity values are correct.
In [92]:
for neighbour in neighbourhood:
print('User ID:', neighbour,
'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
'Item rating:', recommender.user_dictionary[neighbour].item_ratings[item_id],
'Average rating:', recommender.user_dictionary[neighbour].average_overall_rating)
With this information, we can now calculate the predicted rating using the formula
In [79]:
Math(r'R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)')
Out[79]:
In [80]:
Math(r'z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}')
Out[80]:
So the predicted rating that MyUser would give to MyItem is:
In [94]:
my_review = test[67]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']
# Weighted sum recommender
denominator = 1.0 + 0.666 + 0.666 + 0.666 + 0.5 + 0.4 + 0.333
ws_numerator = 1.0 * 5.0 + 0.666 * 4.0 + 0.666 * 4.0 + 0.666 * 3.0 + 0.5 * 2.0 + 0.4 * 4.0 + 0.333 * 3.0
ws_predicted_rating = ws_numerator / denominator
print('Manual weighted sum predicted Rating:', ws_predicted_rating)
print('Weighted sum recommender system predicted rating:', ws_recommender.predict_rating(user_id, item_id))
# Adjusted weighted sum recommender
user_average_rating = recommender.user_dictionary[user_id].average_overall_rating
aws_numerator = 1.0 * (5.0 - 5.0) + 0.666 * (4.0 - 4.222) + 0.666 * (4.0 - 4.0) +\
0.666 * (3.0 - 4.0) + 0.5 * (2.0 - 3.0) + 0.4 * (4.0 - 3.833) + 0.333 * (3.0 - 2.833)
aws_predicted_rating = user_average_rating + aws_numerator / denominator
print('Manual adjusted weighted sum predicted Rating:', aws_predicted_rating)
print('Adjusted weighted sum recommender system predicted rating:', aws_recommender.predict_rating(user_id, item_id))
print('Average recommender rating:', avg_recommender.predict_rating(user_id, item_id))
print('Dummy recommender rating:', dummy_recommender.predict_rating(user_id, item_id))
actual_rating = my_review['overall_rating']
print('Actual rating:', actual_rating)
ws_error = abs(actual_rating - ws_predicted_rating)
aws_error = abs(actual_rating - aws_predicted_rating)
avg_error = abs(actual_rating - avg_recommender.predict_rating(user_id, item_id))
dummy_error = abs(actual_rating - dummy_recommender.predict_rating(user_id, item_id))
print('Errors:', 'WS:', ws_error, 'AWS:', aws_error, 'AVG:', avg_error, 'Dummy:', dummy_error)
In [83]:
print(test[67])
print(test[75])