Debugging the recommender systems one by one

In this notebook we are going to debug two recommender systems in order to find out how they are behaving. To achieve this, we do a step by step execution of the algorithms printing the values each variable has at each step. These are the steps we are going to follow:

  1. Load and shuffle the reviews
  2. Split the reviews into training and test set
  3. Select a review from the test set (a user, item, rating triplet). We will call the user MyUser, and the item MyItem.
  4. Predict the rating the user would give to the item
    1. Find out which users have rated common items with MyUser. These will be the neighbours of MyUser.
    2. For each neighbour, find out the similarity value it shares with MyUser.
    3. Look for the items that the neigbour and MyUser have rated in common and assert that the similarity value is correct.
    4. Manually calculate the predicted rating with the obtained values of the similarity and the ratings that the neighbours have given to MyItem

Load and shuffle the reviews

To accomplish this, we load and shuffle the reviews just once and then store the data in a file. Then we just load the file that contains the reviews already shuffled.


In [43]:
import sys
sys.path.append('/Users/fpena/UCC/Thesis/projects/yelp/source/python')
from IPython.display import Math
from etl import ETLUtils

file_path = '/Users/fpena/tmp/filtered_reviews_multi_non_sparse_shuffled.json'
reviews = ETLUtils.load_json_file(file_path)

print(reviews[0])
print(reviews[1])
print(reviews[2])


{u'multi_ratings': [4.0, 5.0, 4.0, 4.0, 4.0], u'user_id': u'CATID_', u'offering_id': 93618, u'overall_rating': 5.0}
{u'multi_ratings': [4.0, 4.0, 4.0, 4.0, 4.0], u'user_id': u'CC813BCDB9DA614B728B36E92E27BF9B', u'offering_id': 78046, u'overall_rating': 4.0}
{u'multi_ratings': [5.0, 5.0, 4.0, 5.0, 4.0], u'user_id': u'9FF630DF29C67791978600FC5B2DBFE8', u'offering_id': 1218792, u'overall_rating': 4.0}

From the above code, we can see that the reviews are loaded as expected. Now we proceed to split the dataset into the training set and the test set.

Split the reviews into training and test set

This is pretty straightforward, we just use one of the methods contained in the ETLUtils class.


In [44]:
train, test = ETLUtils.split_train_test(reviews, split=0.8, start=0.2, shuffle_data=False)

print('Training')
print(train[0])
print(train[1])
print(train[2])

print('\nTest')
print(test[0])
print(test[1])
print(test[2])


Training
{u'multi_ratings': [4.0, 5.0, 2.0, 4.0, 3.0], u'user_id': u'44F4B94E47AE70D65C2A98F6EBFAAFF4', u'offering_id': 98452, u'overall_rating': 3.0}
{u'multi_ratings': [2.0, 4.0, 3.0, 2.0, 5.0], u'user_id': u'CATID_', u'offering_id': 78046, u'overall_rating': 3.0}
{u'multi_ratings': [5.0, 5.0, 3.0, 3.0, 4.0], u'user_id': u'3BFADBAEA5E11139E082E02E6388BBAE', u'offering_id': 81461, u'overall_rating': 4.0}

Test
{u'multi_ratings': [4.0, 5.0, 4.0, 4.0, 4.0], u'user_id': u'CATID_', u'offering_id': 93618, u'overall_rating': 5.0}
{u'multi_ratings': [4.0, 4.0, 4.0, 4.0, 4.0], u'user_id': u'CC813BCDB9DA614B728B36E92E27BF9B', u'offering_id': 78046, u'overall_rating': 4.0}
{u'multi_ratings': [5.0, 5.0, 4.0, 5.0, 4.0], u'user_id': u'9FF630DF29C67791978600FC5B2DBFE8', u'offering_id': 1218792, u'overall_rating': 4.0}

In the above code, we are telling the split_train_test method that we want to split the data into a 80-20 way, the parameter split=0.8 indicates that the size of the training data should be of 80%. We are also telling the method that we want the training data to start after the 20% of the list. This means that the training data will go from 20% of the list to 100%, and the test data is the remaining, i.e. from 0% to 20%. A graphical way to express this is [OXXXX], where O is the test data and X is the training data.

In the output we can see that the first three records of the test set are the same as the first three records of the whole reviews set, which is something we expected.

Select a review from the test set

After playing around with different sets, we have manually chose three reviews that can illustrate very well how is the recommender system working. We have chose these reviews because the users in them have more than one neighbour, and in some cases the neighbours share more than one item in common.

The three selected reviews are in the positions 1, 2 and 8 of the test data set.


In [45]:
print(test[1])
print(test[2])
print(test[8])


{u'multi_ratings': [4.0, 4.0, 4.0, 4.0, 4.0], u'user_id': u'CC813BCDB9DA614B728B36E92E27BF9B', u'offering_id': 78046, u'overall_rating': 4.0}
{u'multi_ratings': [5.0, 5.0, 4.0, 5.0, 4.0], u'user_id': u'9FF630DF29C67791978600FC5B2DBFE8', u'offering_id': 1218792, u'overall_rating': 4.0}
{u'multi_ratings': [4.0, 5.0, 4.0, 5.0, 5.0], u'user_id': u'02860EA0ED535DE587635F1F8FC7D0C0', u'offering_id': 84079, u'overall_rating': 5.0}

Predict the rating MyUser would give to the MyItem

In order to predict the rating, the first thing we are going to do is create an instance of the recommender system and then load the training data into the recommender. We will also set the maximum number of nighbours as 5. We can achive that with the following code:


In [46]:
from recommenders.weighted_sum_recommender import WeightedSumRecommender
from recommenders.similarity.single_similarity_matrix_builder import SingleSimilarityMatrixBuilder
from recommenders.adjusted_weighted_sum_recommender import AdjustedWeightedSumRecommender
from recommenders.average_recommender import AverageRecommender
from recommenders.dummy_recommender import DummyRecommender

recommender = WeightedSumRecommender(SingleSimilarityMatrixBuilder('euclidean'))
recommender._num_neighbors = 5
recommender.load(train)

We will also create several recommenders to compare their performance


In [47]:
ws_recommender = WeightedSumRecommender(SingleSimilarityMatrixBuilder('euclidean'))
ws_recommender._num_neighbors = 5
ws_recommender.load(train)

aws_recommender = AdjustedWeightedSumRecommender(SingleSimilarityMatrixBuilder('euclidean'))
aws_recommender._num_neighbors = 5
aws_recommender.load(train)

avg_recommender = AverageRecommender()
avg_recommender.load(train)

dummy_recommender = DummyRecommender(4.0)


('Mean rating:', 3.913245873889124)

Review 1

For this review we have the following information


In [48]:
my_review = test[1]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']

print('User ID:', user_id)
print('Item ID:', item_id)
print('Rating:', rating)


('User ID:', u'CC813BCDB9DA614B728B36E92E27BF9B')
('Item ID:', 78046)
('Rating:', 4.0)

Find out which users have rated common items with MyUser


In [49]:
# If the user is not in the training set, then we can't predict the rating
if user_id not in recommender.user_ids:
    predicted_rating = None

# We obtain the neighbours of the user
# The neighbourhood is the set of users that have rated the same item as MyUser in the training set
# and that also have rated MyItem
neighbourhood = recommender.get_neighbourhood(user_id, item_id)

For this particular user, the neighbourhood size is:


In [50]:
print('Neighbourhood size:', len(neighbourhood))


('Neighbourhood size:', 3)

And the neighbours are:


In [51]:
for neighbour in neighbourhood:
    print(neighbour)


113CB53AD6CF0EEC7427F804A7D0607E
DCCD38C886A7C72C215AFCCB625AC03C
6562BBD4EA770FE84E579622F68FA181

For each neighbour, find out the similarity value it shares with MyUser

For each neighbour, we are going to print its user_id, the similarity it has with MyUser, and the rating it has given to item_id


In [52]:
for neighbour in neighbourhood:
    print('User ID:', neighbour,
          'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
          'Rating:', recommender.user_dictionary[neighbour].item_ratings[item_id])


('User ID:', u'113CB53AD6CF0EEC7427F804A7D0607E', 'Similarity', 0.5, 'Rating:', 3.0)
('User ID:', u'DCCD38C886A7C72C215AFCCB625AC03C', 'Similarity', 0.33333333333333331, 'Rating:', 4.0)
('User ID:', u'6562BBD4EA770FE84E579622F68FA181', 'Similarity', 0.33333333333333331, 'Rating:', 3.0)

Look for the items that the neigbour and MyUser have rated in common and assert that the similarity value is correct.

Now we are going to verify that the similarity values are correct. To do so, we are going to see the items that the neighbours have in common with MyUser


In [53]:
from tripadvisor.fourcity import extractor

for neighbour in neighbourhood:
    print('User ID:', neighbour,
          'Common items:', extractor.get_common_items(recommender.user_dictionary, user_id, neighbour))


('User ID:', u'113CB53AD6CF0EEC7427F804A7D0607E', 'Common items:', set([611947]))
('User ID:', u'DCCD38C886A7C72C215AFCCB625AC03C', 'Common items:', set([93352]))
('User ID:', u'6562BBD4EA770FE84E579622F68FA181', 'Common items:', set([611947]))

For each neighbour we are going to calculate the similarity by comparing the rating they have given to the item, with MyUser's rating, and then manually calculate the euclidean similarity by using the formula similarity = 1 / (1 + euclidean_distance)


In [54]:
neighbour_id = '113CB53AD6CF0EEC7427F804A7D0607E'
neughbour_item = 611947
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))

neighbour_id = 'DCCD38C886A7C72C215AFCCB625AC03C'
neughbour_item = 93352
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))

neighbour_id = '6562BBD4EA770FE84E579622F68FA181'
neughbour_item = 611947
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))


('Neighbour ID:', '113CB53AD6CF0EEC7427F804A7D0607E', 'MyUser rating:', 4.0, 'Neighbour rating:', 5.0, 'Similarity:', 0.5)
('Neighbour ID:', 'DCCD38C886A7C72C215AFCCB625AC03C', 'MyUser rating:', 5.0, 'Neighbour rating:', 3.0, 'Similarity:', 0.33333333333333331)
('Neighbour ID:', '6562BBD4EA770FE84E579622F68FA181', 'MyUser rating:', 4.0, 'Neighbour rating:', 2.0, 'Similarity:', 0.33333333333333331)

As we can see in the above steps, the similarity values are correct.

Manually calculate the predicted rating with the obtained values of the similarity and the ratings that the neighbours have given to MyItem

For this step we are going to use some information we have calculated above, in order to make it more clear, we are going to paste the same code again:


In [55]:
for neighbour in neighbourhood:
    print('User ID:', neighbour,
          'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
          'Item rating:', recommender.user_dictionary[neighbour].item_ratings[item_id],
          'Average rating:', recommender.user_dictionary[neighbour].average_overall_rating)


('User ID:', u'113CB53AD6CF0EEC7427F804A7D0607E', 'Similarity', 0.5, 'Item rating:', 3.0, 'Average rating:', 4.285714285714286)
('User ID:', u'DCCD38C886A7C72C215AFCCB625AC03C', 'Similarity', 0.33333333333333331, 'Item rating:', 4.0, 'Average rating:', 3.4285714285714284)
('User ID:', u'6562BBD4EA770FE84E579622F68FA181', 'Similarity', 0.33333333333333331, 'Item rating:', 3.0, 'Average rating:', 3.7857142857142856)

With this information, we can now calculate the predicted rating using the weighted sum recommender formula:


In [56]:
Math(r'R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)')


Out[56]:
$$R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)$$

and the adjusted weighted sum recommender formula:


In [57]:
Math(r'R(u,i) = \overline{R(u)} + z \sum_{u` \in N(u)} sim(u,u`) \cdot (R(u` ,i) - \overline{R(u`)})')


Out[57]:
$$R(u,i) = \overline{R(u)} + z \sum_{u` \in N(u)} sim(u,u`) \cdot (R(u` ,i) - \overline{R(u`)})$$

where


In [58]:
Math(r'z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}')


Out[58]:
$$z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}$$

So the predicted rating that MyUser would give to MyItem is:


In [59]:
my_review = test[1]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']

# Weighted sum recommender
denominator = 0.5 + 0.333 + 0.333
ws_numerator = 0.5 * 3.0 + 0.333 * 4.0 + 0.333 * 3.0
ws_predicted_rating = ws_numerator / denominator
print('Manual weighted sum predicted Rating:', ws_predicted_rating)
print('Weighted sum recommender system predicted rating:', ws_recommender.predict_rating(user_id, item_id))

# Adjusted weighted sum recommender

user_average_rating = recommender.user_dictionary[user_id].average_overall_rating
aws_numerator = 0.5 * (3.0 - 4.286) + 0.333 * (4.0 - 3.429) + 0.333 * (3.0 - 3.786)
aws_predicted_rating = user_average_rating + aws_numerator / denominator
print('Manual adjusted weighted sum predicted Rating:', aws_predicted_rating)
print('Adjusted weighted sum recommender system predicted rating:', aws_recommender.predict_rating(user_id, item_id))
print('Average recommender rating:', avg_recommender.predict_rating(user_id, item_id))
print('Dummy recommender rating:', dummy_recommender.predict_rating(user_id, item_id))

actual_rating = my_review['overall_rating']
print('Actual rating:', actual_rating)

ws_error = abs(actual_rating -  ws_predicted_rating)
aws_error = abs(actual_rating -  aws_predicted_rating)
avg_error = abs(actual_rating -  avg_recommender.predict_rating(user_id, item_id))
dummy_error = abs(actual_rating -  dummy_recommender.predict_rating(user_id, item_id))
print('Errors:', 'WS:', ws_error, 'AWS:', aws_error, 'AVG:', avg_error, 'Dummy:', dummy_error)


('Manual weighted sum predicted Rating:', 3.2855917667238423)
('Weighted sum recommender system predicted rating:', 3.285714285714286)
('Manual adjusted weighted sum predicted Rating:', 3.387139794168096)
('Neighbourhood: ', 3, u'CC813BCDB9DA614B728B36E92E27BF9B', 78046)
('Adjusted weighted sum recommender system predicted rating:', 3.3877551020408161)
('Average recommender rating:', 3.913245873889124)
('Dummy recommender rating:', 4.0)
('Actual rating:', 4.0)
('Errors:', 'WS:', 0.7144082332761577, 'AWS:', 0.6128602058319039, 'AVG:', 0.086754126110875962, 'Dummy:', 0.0)

Review 2

For this review we have the following information


In [60]:
my_review = test[2]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']

print('User ID:', user_id)
print('Item ID:', item_id)
print('Rating:', rating)


('User ID:', u'9FF630DF29C67791978600FC5B2DBFE8')
('Item ID:', 1218792)
('Rating:', 4.0)

Find out which users have rated common items with MyUser


In [61]:
# If the user is not in the training set, then we can't predict the rating
if user_id not in recommender.user_ids:
    predicted_rating = None

# We obtain the neighbours of the user
# The neighbourhood is the set of users that have rated the same item as MyUser in the training set
# and that also have rated MyItem
neighbourhood = recommender.get_neighbourhood(user_id, item_id)

For this particular user, the neighbourhood size is:


In [62]:
print('Neighbourhood size:', len(neighbourhood))


('Neighbourhood size:', 9)

And the neighbours are:


In [63]:
for neighbour in neighbourhood:
    print(neighbour)


1609426679198144A65A85C951CEFEF3
997502F65A2DC9C0664EF9990ABDC125
1CE94B8ABEB460F3719B9CE9EE1E476C
43015504348C59179CDA040A674CFA7E
249063E5108AA2DA3728ACE7D7FCB0AE
EE8824C794ADC33CFEAEEF77EAD5F330
FE60DF6C1DAF87EE95521FFF608F70C9
CFC4C5EF9E8FCE021B149F0A9819CD64
996B94C522AFCBA60A57F86D7019C7A7

For each neighbour, find out the similarity value it shares with MyUser

For each neighbour, we are going to print its user_id, the similarity it has with MyUser, and the rating it has given to item_id


In [64]:
for neighbour in neighbourhood:
    print('User ID:', neighbour,
          'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
          'Rating:', recommender.user_dictionary[neighbour].item_ratings[item_id])


('User ID:', u'1609426679198144A65A85C951CEFEF3', 'Similarity', 1.0, 'Rating:', 4.0)
('User ID:', u'997502F65A2DC9C0664EF9990ABDC125', 'Similarity', 1.0, 'Rating:', 3.0)
('User ID:', u'1CE94B8ABEB460F3719B9CE9EE1E476C', 'Similarity', 1.0, 'Rating:', 2.0)
('User ID:', u'43015504348C59179CDA040A674CFA7E', 'Similarity', 1.0, 'Rating:', 4.0)
('User ID:', u'249063E5108AA2DA3728ACE7D7FCB0AE', 'Similarity', 0.5, 'Rating:', 5.0)
('User ID:', u'EE8824C794ADC33CFEAEEF77EAD5F330', 'Similarity', 0.5, 'Rating:', 5.0)
('User ID:', u'FE60DF6C1DAF87EE95521FFF608F70C9', 'Similarity', 0.5, 'Rating:', 4.0)
('User ID:', u'CFC4C5EF9E8FCE021B149F0A9819CD64', 'Similarity', 0.5, 'Rating:', 4.0)
('User ID:', u'996B94C522AFCBA60A57F86D7019C7A7', 'Similarity', 0.5, 'Rating:', 5.0)

Look for the items that the neigbour and MyUser have rated in common and assert that the similarity value is correct.

Now we are going to verify that the similarity values are correct. To do so, we are going to see the items that the neighbours have in common with MyUser. We will do this just for the first 5 items, because we are only considering the


In [65]:
from tripadvisor.fourcity import extractor

for neighbour in neighbourhood[:5]:
    print('User ID:', neighbour,
          'Common items:', extractor.get_common_items(recommender.user_dictionary, user_id, neighbour))


('User ID:', u'1609426679198144A65A85C951CEFEF3', 'Common items:', set([102466]))
('User ID:', u'997502F65A2DC9C0664EF9990ABDC125', 'Common items:', set([102466]))
('User ID:', u'1CE94B8ABEB460F3719B9CE9EE1E476C', 'Common items:', set([96688, 102466]))
('User ID:', u'43015504348C59179CDA040A674CFA7E', 'Common items:', set([99352]))
('User ID:', u'249063E5108AA2DA3728ACE7D7FCB0AE', 'Common items:', set([81295]))

For each neighbour we are going to calculate the similarity by comparing the rating they have given to the item, with MyUser's rating, and then manually calculate the euclidean similarity by using the formula similarity = 1 / (1 + euclidean_distance)


In [66]:
neighbour_id = '1609426679198144A65A85C951CEFEF3'
neughbour_item = 102466
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))

neighbour_id = '997502F65A2DC9C0664EF9990ABDC125'
neughbour_item = 102466
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))

neighbour_id = '1CE94B8ABEB460F3719B9CE9EE1E476C'
neighbour_item1 = 96688
neighbour_item2 = 102466
my_user_rating1 = recommender.user_dictionary[user_id].item_ratings[neighbour_item1]
my_user_rating2 = recommender.user_dictionary[user_id].item_ratings[neighbour_item2]
neighbour_rating1 = recommender.user_dictionary[neighbour_id].item_ratings[neighbour_item1]
neighbour_rating2 = recommender.user_dictionary[neighbour_id].item_ratings[neighbour_item2]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating1 - neighbour_rating1)**2 + (my_user_rating2 - neighbour_rating2)**2)**0.5)))

neighbour_id = '43015504348C59179CDA040A674CFA7E'
neughbour_item = 99352
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))

neighbour_id = '249063E5108AA2DA3728ACE7D7FCB0AE'
neughbour_item = 81295
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))


('Neighbour ID:', '1609426679198144A65A85C951CEFEF3', 'MyUser rating:', 4.0, 'Neighbour rating:', 4.0, 'Similarity:', 1.0)
('Neighbour ID:', '997502F65A2DC9C0664EF9990ABDC125', 'MyUser rating:', 4.0, 'Neighbour rating:', 4.0, 'Similarity:', 1.0)
('Neighbour ID:', '1CE94B8ABEB460F3719B9CE9EE1E476C', 'MyUser rating:', 4.0, 'Neighbour rating:', 4.0, 'Similarity:', 1.0)
('Neighbour ID:', '43015504348C59179CDA040A674CFA7E', 'MyUser rating:', 5.0, 'Neighbour rating:', 5.0, 'Similarity:', 1.0)
('Neighbour ID:', '249063E5108AA2DA3728ACE7D7FCB0AE', 'MyUser rating:', 4.0, 'Neighbour rating:', 3.0, 'Similarity:', 0.5)

As we can see in the above steps, the similarity values are correct.

Manually calculate the predicted rating with the obtained values of the similarity and the ratings that the neighbours have given to MyItem

For this step we are going to use some information we have calculated above, in order to make it more clear, we are going to paste the same code again:


In [67]:
for neighbour in neighbourhood:
    print('User ID:', neighbour,
          'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
          'Item rating:', recommender.user_dictionary[neighbour].item_ratings[item_id],
          'Average rating:', recommender.user_dictionary[neighbour].average_overall_rating)


('User ID:', u'1609426679198144A65A85C951CEFEF3', 'Similarity', 1.0, 'Item rating:', 4.0, 'Average rating:', 4.0)
('User ID:', u'997502F65A2DC9C0664EF9990ABDC125', 'Similarity', 1.0, 'Item rating:', 3.0, 'Average rating:', 3.875)
('User ID:', u'1CE94B8ABEB460F3719B9CE9EE1E476C', 'Similarity', 1.0, 'Item rating:', 2.0, 'Average rating:', 3.3333333333333335)
('User ID:', u'43015504348C59179CDA040A674CFA7E', 'Similarity', 1.0, 'Item rating:', 4.0, 'Average rating:', 4.333333333333333)
('User ID:', u'249063E5108AA2DA3728ACE7D7FCB0AE', 'Similarity', 0.5, 'Item rating:', 5.0, 'Average rating:', 3.6666666666666665)
('User ID:', u'EE8824C794ADC33CFEAEEF77EAD5F330', 'Similarity', 0.5, 'Item rating:', 5.0, 'Average rating:', 4.0)
('User ID:', u'FE60DF6C1DAF87EE95521FFF608F70C9', 'Similarity', 0.5, 'Item rating:', 4.0, 'Average rating:', 4.0)
('User ID:', u'CFC4C5EF9E8FCE021B149F0A9819CD64', 'Similarity', 0.5, 'Item rating:', 4.0, 'Average rating:', 4.0)
('User ID:', u'996B94C522AFCBA60A57F86D7019C7A7', 'Similarity', 0.5, 'Item rating:', 5.0, 'Average rating:', 4.6)

With this information, we can now calculate the predicted rating using the formula


In [68]:
Math(r'R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)')


Out[68]:
$$R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)$$

In [69]:
Math(r'z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}')


Out[69]:
$$z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}$$

So the predicted rating that MyUser would give to MyItem is:


In [70]:
my_review = test[2]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']

# Weighted sum recommender
denominator = 1.0 + 1.0 + 1.0 + 1.0 + 0.5
ws_numerator = 1.0 * 4.0 + 1.0 * 3.0 + 1.0 * 2.0 + 1.0 * 4.0 + 0.5 * 5.0
ws_predicted_rating = ws_numerator / denominator
print('Manual weighted sum predicted Rating:', ws_predicted_rating)
print('Weighted sum recommender system predicted rating:', ws_recommender.predict_rating(user_id, item_id))

# Adjusted weighted sum recommender

user_average_rating = recommender.user_dictionary[user_id].average_overall_rating
aws_numerator = 1.0 * (4.0 - 4.0) + 1.0 * (3.0 - 3.875) + 1.0 * (2.0 - 3.333) + 1.0 * (4.0 - 4.333) + 0.5 * (5.0 - 3.66)
aws_predicted_rating = user_average_rating + aws_numerator / denominator
print('Manual adjusted weighted sum predicted Rating:', aws_predicted_rating)
print('Adjusted weighted sum recommender system predicted rating:', aws_recommender.predict_rating(user_id, item_id))
print('Average recommender rating:', avg_recommender.predict_rating(user_id, item_id))
print('Dummy recommender rating:', dummy_recommender.predict_rating(user_id, item_id))

actual_rating = my_review['overall_rating']
print('Actual rating:', actual_rating)

ws_error = abs(actual_rating -  ws_predicted_rating)
aws_error = abs(actual_rating -  aws_predicted_rating)
avg_error = abs(actual_rating -  avg_recommender.predict_rating(user_id, item_id))
dummy_error = abs(actual_rating -  dummy_recommender.predict_rating(user_id, item_id))
print('Errors:', 'WS:', ws_error, 'AWS:', aws_error, 'AVG:', avg_error, 'Dummy:', dummy_error)


('Manual weighted sum predicted Rating:', 3.4444444444444446)
('Weighted sum recommender system predicted rating:', 3.4444444444444446)
('Manual adjusted weighted sum predicted Rating:', 3.9842222222222223)
('Neighbourhood: ', 9, u'9FF630DF29C67791978600FC5B2DBFE8', 1218792)
('Adjusted weighted sum recommender system predicted rating:', 3.9833333333333338)
('Average recommender rating:', 3.913245873889124)
('Dummy recommender rating:', 4.0)
('Actual rating:', 4.0)
('Errors:', 'WS:', 0.5555555555555554, 'AWS:', 0.01577777777777767, 'AVG:', 0.086754126110875962, 'Dummy:', 0.0)

Review 3

For this review we have the following information


In [71]:
my_review = test[8]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']

print('User ID:', user_id)
print('Item ID:', item_id)
print('Rating:', rating)


('User ID:', u'02860EA0ED535DE587635F1F8FC7D0C0')
('Item ID:', 84079)
('Rating:', 5.0)

Find out which users have rated common items with MyUser


In [72]:
# If the user is not in the training set, then we can't predict the rating
if user_id not in recommender.user_ids:
    predicted_rating = None

# We obtain the neighbours of the user
# The neighbourhood is the set of users that have rated the same item as MyUser in the training set
# and that also have rated MyItem
neighbourhood = recommender.get_neighbourhood(user_id, item_id)

For this particular user, the neighbourhood size is:


In [73]:
print('Neighbourhood size:', len(neighbourhood))


('Neighbourhood size:', 3)

And the neighbours are:


In [74]:
for neighbour in neighbourhood:
    print(neighbour)


75D2B2509A733463F97A52877D078D86
1BB22F3C5F3C52E120EAF07B4D924605
48D750137FF4D27ADA7CFE294D2315FB

For each neighbour, find out the similarity value it shares with MyUser

For each neighbour, we are going to print its user_id, the similarity it has with MyUser, and the rating it has given to item_id


In [75]:
for neighbour in neighbourhood:
    print('User ID:', neighbour,
          'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
          'Rating:', recommender.user_dictionary[neighbour].item_ratings[item_id])


('User ID:', u'75D2B2509A733463F97A52877D078D86', 'Similarity', 0.5, 'Rating:', 4.0)
('User ID:', u'1BB22F3C5F3C52E120EAF07B4D924605', 'Similarity', 0.33333333333333331, 'Rating:', 4.0)
('User ID:', u'48D750137FF4D27ADA7CFE294D2315FB', 'Similarity', 0.3090169943749474, 'Rating:', 5.0)

Look for the items that the neigbour and MyUser have rated in common and assert that the similarity value is correct.

Now we are going to verify that the similarity values are correct. To do so, we are going to see the items that the neighbours have in common with MyUser


In [76]:
from tripadvisor.fourcity import extractor

for neighbour in neighbourhood:
    print('User ID:', neighbour,
          'Common items:', extractor.get_common_items(recommender.user_dictionary, user_id, neighbour))


('User ID:', u'75D2B2509A733463F97A52877D078D86', 'Common items:', set([84093]))
('User ID:', u'1BB22F3C5F3C52E120EAF07B4D924605', 'Common items:', set([84093]))
('User ID:', u'48D750137FF4D27ADA7CFE294D2315FB', 'Common items:', set([84093, 84087]))

For each neighbour we are going to calculate the similarity by comparing the rating they have given to the item, with MyUser's rating, and then manually calculate the euclidean similarity by using the formula similarity = 1 / (1 + euclidean_distance)


In [77]:
neighbour_id = '75D2B2509A733463F97A52877D078D86'
neughbour_item = 84093
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))

neighbour_id = '1BB22F3C5F3C52E120EAF07B4D924605'
neughbour_item = 84093
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))

neighbour_id = '48D750137FF4D27ADA7CFE294D2315FB'
neighbour_item1 = 84093
neighbour_item2 = 84087
my_user_rating1 = recommender.user_dictionary[user_id].item_ratings[neighbour_item1]
my_user_rating2 = recommender.user_dictionary[user_id].item_ratings[neighbour_item2]
neighbour_rating1 = recommender.user_dictionary[neighbour_id].item_ratings[neighbour_item1]
neighbour_rating2 = recommender.user_dictionary[neighbour_id].item_ratings[neighbour_item2]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating1 - neighbour_rating1)**2 + (my_user_rating2 - neighbour_rating2)**2)**0.5)))


('Neighbour ID:', '75D2B2509A733463F97A52877D078D86', 'MyUser rating:', 3.0, 'Neighbour rating:', 4.0, 'Similarity:', 0.5)
('Neighbour ID:', '1BB22F3C5F3C52E120EAF07B4D924605', 'MyUser rating:', 3.0, 'Neighbour rating:', 5.0, 'Similarity:', 0.33333333333333331)
('Neighbour ID:', '48D750137FF4D27ADA7CFE294D2315FB', 'MyUser rating:', 3.0, 'Neighbour rating:', 5.0, 'Similarity:', 0.3090169943749474)

As we can see in the above steps, the similarity values are correct.

Manually calculate the predicted rating with the obtained values of the similarity and the ratings that the neighbours have given to MyItem

For this step we are going to use some information we have calculated above, in order to make it more clear, we are going to paste the same code again:


In [78]:
for neighbour in neighbourhood:
    print('User ID:', neighbour,
          'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
          'Item rating:', recommender.user_dictionary[neighbour].item_ratings[item_id],
          'Average rating:', recommender.user_dictionary[neighbour].average_overall_rating)


('User ID:', u'75D2B2509A733463F97A52877D078D86', 'Similarity', 0.5, 'Item rating:', 4.0, 'Average rating:', 4.0)
('User ID:', u'1BB22F3C5F3C52E120EAF07B4D924605', 'Similarity', 0.33333333333333331, 'Item rating:', 4.0, 'Average rating:', 4.333333333333333)
('User ID:', u'48D750137FF4D27ADA7CFE294D2315FB', 'Similarity', 0.3090169943749474, 'Item rating:', 5.0, 'Average rating:', 5.0)

With this information, we can now calculate the predicted rating using the formula


In [79]:
Math(r'R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)')


Out[79]:
$$R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)$$

In [80]:
Math(r'z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}')


Out[80]:
$$z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}$$

So the predicted rating that MyUser would give to MyItem is:


In [81]:
my_review = test[8]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']

# Weighted sum recommender
denominator = 0.5 + 0.333 + 0.309
ws_numerator = 0.5 * 4.0 + 0.333 * 4.0 + 0.309 * 5.0
ws_predicted_rating = ws_numerator / denominator
print('Manual weighted sum predicted Rating:', ws_predicted_rating)
print('Weighted sum recommender system predicted rating:', ws_recommender.predict_rating(user_id, item_id))

# Adjusted weighted sum recommender

user_average_rating = recommender.user_dictionary[user_id].average_overall_rating
aws_numerator = 0.5 * (4.0 - 4.0) + 0.333 * (4.0 - 4.333) + 0.309 * (5.0 - 5.0)
aws_predicted_rating = user_average_rating + aws_numerator / denominator
print('Manual adjusted weighted sum predicted Rating:', aws_predicted_rating)
print('Adjusted weighted sum recommender system predicted rating:', aws_recommender.predict_rating(user_id, item_id))
print('Average recommender rating:', avg_recommender.predict_rating(user_id, item_id))
print('Dummy recommender rating:', dummy_recommender.predict_rating(user_id, item_id))

actual_rating = my_review['overall_rating']
print('Actual rating:', actual_rating)

ws_error = abs(actual_rating -  ws_predicted_rating)
aws_error = abs(actual_rating -  aws_predicted_rating)
avg_error = abs(actual_rating -  avg_recommender.predict_rating(user_id, item_id))
dummy_error = abs(actual_rating -  dummy_recommender.predict_rating(user_id, item_id))
print('Errors:', 'WS:', ws_error, 'AWS:', aws_error, 'AVG:', avg_error, 'Dummy:', dummy_error)


('Manual weighted sum predicted Rating:', 4.270577933450087)
('Weighted sum recommender system predicted rating:', 4.2705098312484226)
('Manual adjusted weighted sum predicted Rating:', 3.4028992994746057)
('Neighbourhood: ', 3, u'02860EA0ED535DE587635F1F8FC7D0C0', 84079)
('Adjusted weighted sum recommender system predicted rating:', 3.4027346441664563)
('Average recommender rating:', 3.913245873889124)
('Dummy recommender rating:', 4.0)
('Actual rating:', 5.0)
('Errors:', 'WS:', 0.7294220665499127, 'AWS:', 1.5971007005253943, 'AVG:', 1.086754126110876, 'Dummy:', 1.0)

Review 4

For this review we have the following information


In [84]:
my_review = test[67]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']

print('User ID:', user_id)
print('Item ID:', item_id)
print('Rating:', rating)


('User ID:', u'970BF90FE4738BB9A7BD78B01E20810C')
('Item ID:', 93450)
('Rating:', 2.0)

Find out which users have rated common items with MyUser


In [85]:
# If the user is not in the training set, then we can't predict the rating
if user_id not in recommender.user_ids:
    predicted_rating = None

# We obtain the neighbours of the user
# The neighbourhood is the set of users that have rated the same item as MyUser in the training set
# and that also have rated MyItem
neighbourhood = recommender.get_neighbourhood(user_id, item_id)

For this particular user, the neighbourhood size is:


In [86]:
print('Neighbourhood size:', len(neighbourhood))


('Neighbourhood size:', 7)

And the neighbours are:


In [87]:
for neighbour in neighbourhood:
    print(neighbour)


8AE0A5F56DE525C2BCDF87D7E54E9DF0
6C64DCB228E0255547A69C46B138B280
8DCF9BE7F5EAB3DE26B896DB3702D796
C5907018EBC2EC24FA4C8B9CCD2C8CBD
A9DBCFE3E77DB2DC0F989BE387A5B87A
15260F7D29FDCE47CBFA7BACD55C5326
F6CF03D6B759EA5C2394037470DFBDDB

For each neighbour, find out the similarity value it shares with MyUser

For each neighbour, we are going to print its user_id, the similarity it has with MyUser, and the rating it has given to item_id


In [90]:
for neighbour in neighbourhood:
    print('User ID:', neighbour,
          'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
          'Rating:', recommender.user_dictionary[neighbour].item_ratings[item_id])


('User ID:', u'8AE0A5F56DE525C2BCDF87D7E54E9DF0', 'Similarity', 1.0, 'Rating:', 5.0)
('User ID:', u'6C64DCB228E0255547A69C46B138B280', 'Similarity', 0.66666666666666663, 'Rating:', 4.0)
('User ID:', u'8DCF9BE7F5EAB3DE26B896DB3702D796', 'Similarity', 0.66666666666666663, 'Rating:', 4.0)
('User ID:', u'C5907018EBC2EC24FA4C8B9CCD2C8CBD', 'Similarity', 0.66666666666666663, 'Rating:', 3.0)
('User ID:', u'A9DBCFE3E77DB2DC0F989BE387A5B87A', 'Similarity', 0.5, 'Rating:', 2.0)
('User ID:', u'15260F7D29FDCE47CBFA7BACD55C5326', 'Similarity', 0.40000000000000002, 'Rating:', 4.0)
('User ID:', u'F6CF03D6B759EA5C2394037470DFBDDB', 'Similarity', 0.33333333333333331, 'Rating:', 3.0)

Look for the items that the neigbour and MyUser have rated in common and assert that the similarity value is correct.

Now we are going to verify that the similarity values are correct. To do so, we are going to see the items that the neighbours have in common with MyUser


In [89]:
from tripadvisor.fourcity import extractor

for neighbour in neighbourhood:
    print('User ID:', neighbour,
          'Common items:', extractor.get_common_items(recommender.user_dictionary, user_id, neighbour))


('User ID:', u'8AE0A5F56DE525C2BCDF87D7E54E9DF0', 'Common items:', set([84093]))
('User ID:', u'6C64DCB228E0255547A69C46B138B280', 'Common items:', set([223023]))
('User ID:', u'8DCF9BE7F5EAB3DE26B896DB3702D796', 'Common items:', set([223023]))
('User ID:', u'C5907018EBC2EC24FA4C8B9CCD2C8CBD', 'Common items:', set([223023]))
('User ID:', u'A9DBCFE3E77DB2DC0F989BE387A5B87A', 'Common items:', set([84093]))
('User ID:', u'15260F7D29FDCE47CBFA7BACD55C5326', 'Common items:', set([223023]))
('User ID:', u'F6CF03D6B759EA5C2394037470DFBDDB', 'Common items:', set([84093]))

For each neighbour we are going to calculate the similarity by comparing the rating they have given to the item, with MyUser's rating, and then manually calculate the euclidean similarity by using the formula similarity = 1 / (1 + euclidean_distance)


In [91]:
neighbour_id = '8AE0A5F56DE525C2BCDF87D7E54E9DF0'
neughbour_item = 84093
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))

neighbour_id = '6C64DCB228E0255547A69C46B138B280'
neughbour_item = 223023
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))

neighbour_id = '8DCF9BE7F5EAB3DE26B896DB3702D796'
neughbour_item = 223023
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))

neighbour_id = 'C5907018EBC2EC24FA4C8B9CCD2C8CBD'
neughbour_item = 223023
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))

neighbour_id = 'A9DBCFE3E77DB2DC0F989BE387A5B87A'
neughbour_item = 84093
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))

neighbour_id = '15260F7D29FDCE47CBFA7BACD55C5326'
neughbour_item = 223023
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))

neighbour_id = 'F6CF03D6B759EA5C2394037470DFBDDB'
neughbour_item = 84093
my_user_rating = recommender.user_dictionary[user_id].item_ratings[neughbour_item]
neighbour_rating = recommender.user_dictionary[neighbour_id].item_ratings[neughbour_item]
print('Neighbour ID:', neighbour_id,
      'MyUser rating:', my_user_rating,
      'Neighbour rating:', neighbour_rating,
      'Similarity:', (1/(1 + ((my_user_rating - neighbour_rating) ** 2)**0.5)))


('Neighbour ID:', '8AE0A5F56DE525C2BCDF87D7E54E9DF0', 'MyUser rating:', 5.0, 'Neighbour rating:', 5.0, 'Similarity:', 1.0)
('Neighbour ID:', '6C64DCB228E0255547A69C46B138B280', 'MyUser rating:', 2.5, 'Neighbour rating:', 3.0, 'Similarity:', 0.66666666666666663)
('Neighbour ID:', '8DCF9BE7F5EAB3DE26B896DB3702D796', 'MyUser rating:', 2.5, 'Neighbour rating:', 3.0, 'Similarity:', 0.66666666666666663)
('Neighbour ID:', 'C5907018EBC2EC24FA4C8B9CCD2C8CBD', 'MyUser rating:', 2.5, 'Neighbour rating:', 2.0, 'Similarity:', 0.66666666666666663)
('Neighbour ID:', 'A9DBCFE3E77DB2DC0F989BE387A5B87A', 'MyUser rating:', 5.0, 'Neighbour rating:', 4.0, 'Similarity:', 0.5)
('Neighbour ID:', '15260F7D29FDCE47CBFA7BACD55C5326', 'MyUser rating:', 2.5, 'Neighbour rating:', 4.0, 'Similarity:', 0.40000000000000002)
('Neighbour ID:', 'F6CF03D6B759EA5C2394037470DFBDDB', 'MyUser rating:', 5.0, 'Neighbour rating:', 3.0, 'Similarity:', 0.33333333333333331)

As we can see in the above steps, the similarity values are correct.

Manually calculate the predicted rating with the obtained values of the similarity and the ratings that the neighbours have given to MyItem

For this step we are going to use some information we have calculated above, in order to make it more clear, we are going to paste the same code again:


In [92]:
for neighbour in neighbourhood:
    print('User ID:', neighbour,
          'Similarity', recommender.user_similarity_matrix[neighbour][user_id],
          'Item rating:', recommender.user_dictionary[neighbour].item_ratings[item_id],
          'Average rating:', recommender.user_dictionary[neighbour].average_overall_rating)


('User ID:', u'8AE0A5F56DE525C2BCDF87D7E54E9DF0', 'Similarity', 1.0, 'Item rating:', 5.0, 'Average rating:', 5.0)
('User ID:', u'6C64DCB228E0255547A69C46B138B280', 'Similarity', 0.66666666666666663, 'Item rating:', 4.0, 'Average rating:', 4.222222222222222)
('User ID:', u'8DCF9BE7F5EAB3DE26B896DB3702D796', 'Similarity', 0.66666666666666663, 'Item rating:', 4.0, 'Average rating:', 4.0)
('User ID:', u'C5907018EBC2EC24FA4C8B9CCD2C8CBD', 'Similarity', 0.66666666666666663, 'Item rating:', 3.0, 'Average rating:', 3.0)
('User ID:', u'A9DBCFE3E77DB2DC0F989BE387A5B87A', 'Similarity', 0.5, 'Item rating:', 2.0, 'Average rating:', 3.8333333333333335)
('User ID:', u'15260F7D29FDCE47CBFA7BACD55C5326', 'Similarity', 0.40000000000000002, 'Item rating:', 4.0, 'Average rating:', 4.5)
('User ID:', u'F6CF03D6B759EA5C2394037470DFBDDB', 'Similarity', 0.33333333333333331, 'Item rating:', 3.0, 'Average rating:', 2.8333333333333335)

With this information, we can now calculate the predicted rating using the formula


In [79]:
Math(r'R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)')


Out[79]:
$$R(u,i) = z \sum_{u` \in N(u)} sim(u,u`) \cdot R(u` ,i)$$

In [80]:
Math(r'z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}')


Out[80]:
$$z = \frac{1}{\sum_{u` \in N(u)} sim(u,u`)}$$

So the predicted rating that MyUser would give to MyItem is:


In [94]:
my_review = test[67]
user_id = my_review['user_id']
item_id = my_review['offering_id']
rating = my_review['overall_rating']

# Weighted sum recommender
denominator = 1.0 + 0.666 + 0.666 + 0.666 + 0.5 + 0.4 + 0.333
ws_numerator = 1.0 * 5.0 + 0.666 * 4.0 + 0.666 * 4.0 + 0.666 * 3.0 + 0.5 * 2.0 + 0.4 * 4.0 + 0.333 * 3.0
ws_predicted_rating = ws_numerator / denominator
print('Manual weighted sum predicted Rating:', ws_predicted_rating)
print('Weighted sum recommender system predicted rating:', ws_recommender.predict_rating(user_id, item_id))

# Adjusted weighted sum recommender

user_average_rating = recommender.user_dictionary[user_id].average_overall_rating
aws_numerator = 1.0 * (5.0 - 5.0) + 0.666 * (4.0 - 4.222) + 0.666 * (4.0 - 4.0) +\
                0.666 * (3.0 - 4.0) + 0.5 * (2.0 - 3.0) + 0.4 * (4.0 - 3.833) + 0.333 * (3.0 - 2.833)
aws_predicted_rating = user_average_rating + aws_numerator / denominator
print('Manual adjusted weighted sum predicted Rating:', aws_predicted_rating)
print('Adjusted weighted sum recommender system predicted rating:', aws_recommender.predict_rating(user_id, item_id))
print('Average recommender rating:', avg_recommender.predict_rating(user_id, item_id))
print('Dummy recommender rating:', dummy_recommender.predict_rating(user_id, item_id))

actual_rating = my_review['overall_rating']
print('Actual rating:', actual_rating)

ws_error = abs(actual_rating -  ws_predicted_rating)
aws_error = abs(actual_rating -  aws_predicted_rating)
avg_error = abs(actual_rating -  avg_recommender.predict_rating(user_id, item_id))
dummy_error = abs(actual_rating -  dummy_recommender.predict_rating(user_id, item_id))
print('Errors:', 'WS:', ws_error, 'AWS:', aws_error, 'AVG:', avg_error, 'Dummy:', dummy_error)


('Manual weighted sum predicted Rating:', 3.76388560623966)
('Weighted sum recommender system predicted rating:', 3.8095238095238098)
('Manual adjusted weighted sum predicted Rating:', 3.051735365949736)
('Neighbourhood: ', 7, u'970BF90FE4738BB9A7BD78B01E20810C', 93450)
('Adjusted weighted sum recommender system predicted rating:', 3.0291005291005293)
('Average recommender rating:', 3.913245873889124)
('Dummy recommender rating:', 4.0)
('Actual rating:', 2.0)
('Errors:', 'WS:', 1.76388560623966, 'AWS:', 1.0517353659497362, 'AVG:', 1.913245873889124, 'Dummy:', 2.0)

Review 4

For this review we have the following information

('Neighbourhood: ', 6, u'665F054DF2C3802CCF4DBC5194E8F73A', 93352)


In [83]:
print(test[67])

print(test[75])


{u'multi_ratings': [2.0, 2.0, 3.0, 2.0, 2.0], u'user_id': u'970BF90FE4738BB9A7BD78B01E20810C', u'offering_id': 93450, u'overall_rating': 2.0}
{u'multi_ratings': [5.0, 5.0, 5.0, 5.0, 5.0], u'user_id': u'5DFE96EC85C67F248DEFFA8B84891A6A', u'offering_id': 236299, u'overall_rating': 5.0}