Using the recommendation library


In [1]:
import os
os.chdir('..')

In [2]:
# Import all the packages we need to generate recommendations
import numpy as np
import pandas as pd
import src.utils as utils
import src.recommenders as recommenders
import src.similarity as similarity

# imports necesary for plotting
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline  

# Enable logging on Jupyter notebook
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

In [3]:
# loads dataset 
dataset_folder = os.path.join(os.getcwd(), 'data')
dataset_folder_ready = utils.load_dataset(dataset_folder)

# adds personal ratings to original dataset ratings file.
ratings_file = os.path.join(dataset_folder, 'ml-latest-small','ratings-merged.csv')
[ratings, my_customer_number] = utils.merge_datasets(dataset_folder_ready, ratings_file)


INFO:root:dataset was already downloaded
INFO:root:dataset stored in: /Users/hcorona/github/recsys-101-workshop/data/ml-latest-small
INFO:root:loaded 44 personal ratings
INFO:root:loaded 9125 movies
INFO:root:loaded 100048 ratings in total

In [4]:
# the data is stored in a long pandas dataframe
# we need to pivot the data to create a [user x movie] matrix
ratings_matrix = ratings.pivot_table(index='customer', columns='movie', values='rating', fill_value=0)
ratings_matrix = ratings_matrix.transpose()

Understanding Movie Similarity

  1. Try with different movies
  2. Try with different types of similarity metrics (look in /src/similarity.py)
  3. Which similarity metric works the best?

In [5]:
# find similar movies 
# try with different movie titles and see what happens 
movie_title = 'Star Wars: Episode VI - Return of the Jedi (1983)'
similarity_type = "cosine"
logger.info('top-10 movies similar to %s, using %s similarity', movie_title, similarity_type)
print(similarity.compute_nearest_neighbours(movie_title, ratings_matrix, similarity_type)[0:10])


INFO:root:top-10 movies similar to Star Wars: Episode VI - Return of the Jedi (1983), using cosine similarity
                                                   item  similarity
7490  Star Wars: Episode VI - Return of the Jedi (1983)    1.000000
7489  Star Wars: Episode V - The Empire Strikes Back...    0.785080
7488          Star Wars: Episode IV - A New Hope (1977)    0.762233
6460  Raiders of the Lost Ark (Indiana Jones and the...    0.656852
4030          Indiana Jones and the Last Crusade (1989)    0.647390
657                           Back to the Future (1985)    0.631344
5164                   Men in Black (a.k.a. MIB) (1997)    0.627694
7860                             Terminator, The (1984)    0.609394
5102                                 Matrix, The (1999)    0.607405
7485   Star Wars: Episode I - The Phantom Menace (1999)    0.598890

In [6]:
# find similar movies 
# try with different movie titles and see what happens 
movie_title = 'All About My Mother (Todo sobre mi madre) (1999)'
similarity_type = "pearson"
logger.info('top-10 movies similar to: %s, using %s similarity', movie_title, similarity_type)
print(similarity.compute_nearest_neighbours(movie_title, ratings_matrix, similarity_type)[0:10])


INFO:root:top-10 movies similar to: All About My Mother (Todo sobre mi madre) (1999), using pearson similarity
                                                   item  similarity
307    All About My Mother (Todo sobre mi madre) (1999)    1.000000
672            Bad Education (La mala educación) (2004)    0.504208
7791                Talk to Her (Hable con Ella) (2002)    0.464245
2368  Dreamlife of Angels, The (Vie rêvée des anges,...    0.449805
1511         Central Station (Central do Brasil) (1998)    0.448065
5761                                     Nowhere (1997)    0.441745
1224                          Breaking the Waves (1996)    0.438522
3941                     Idiots, The (Idioterne) (1998)    0.436944
772   Battle of Algiers, The (La battaglia di Algeri...    0.432585
9015                  Your Friends and Neighbors (1998)    0.425468

Creating recommendations for your personal ratings

  1. Try with different similarity metrics (look in /src/similarity.py)
  2. Try with different values of K (K is the number of neigbhours to consider when generating the recommendations)
  3. Which combination of K and number of metrics works better?, discuss it with others.

In [7]:
# get recommendations for a single user
recommendations = recommenders.recommend_uknn(ratings, my_customer_number, K=200, similarity_metric='cosine', N=10)
recommendations


INFO:root:computed nearest neighbours using cosine
Out[7]:
rating movie
0 3.231480 Inception (2010)
1 2.727677 Matrix, The (1999)
2 2.606161 Fight Club (1999)
3 2.596321 Dark Knight, The (2008)
4 2.595773 Forrest Gump (1994)
5 2.547936 Shawshank Redemption, The (1994)
6 2.525448 Pulp Fiction (1994)
7 2.367299 Lord of the Rings: The Fellowship of the Ring,...
8 2.112106 Lord of the Rings: The Return of the King, The...
9 2.088432 Back to the Future (1985)

In [8]:
# get recommendations for a single user
recommendations = recommenders.recommend_iknn(ratings, my_customer_number, K=100, similarity_metric='cosine')
recommendations


Out[8]:
rating movie
0 4.682608 Grand Budapest Hotel, The (2014)
1 4.641638 Dallas Buyers Club (2013)
2 4.583980 Hugo (2011)
3 4.570574 Harry Potter and the Deathly Hallows: Part 1 (...
4 4.563340 The Imitation Game (2014)
5 4.561054 Gravity (2013)
6 4.560607 Way, Way Back, The (2013)
7 4.559606 Star Trek (2009)
8 4.555776 WALL·E (2008)
9 4.553129 28 Weeks Later (2007)