In [1]:
import graphlab as gl
The following code snippet will parse the books data provided at the training.
In [2]:
import os
if os.path.exists('books/ratings'):
ratings = gl.SFrame('books/ratings')
items = gl.SFrame('books/items')
users = gl.SFrame('books/users')
else:
ratings = gl.SFrame.read_csv('books/book-ratings.csv')
ratings.save('books/ratings')
items = gl.SFrame.read_csv('books/book-data.csv')
items.save('books/items')
users = gl.SFrame.read_csv('books/user-data.csv')
users.save('books/users')
Visually explore the above data using GraphLab Canvas.
In this section we will make a model that can be used to recommend new tags to users.
Use gl.recommender.create()
to create a model that can be used to recommend tags to each user.
In [ ]:
Print a summary of the model by simply entering the name of the object.
In [ ]:
Get all unique users from the first 10000 observations and save them as a variable called users
.
In [ ]:
Get 20 recommendations for each user in your list of users. Save these as a new SFrame called recs
.
In [ ]:
Get an SFrame of the 20 most similar items for each observed item.
In [ ]:
This dataset has multiple rows corresponding to the same book, e.g., in situations where reprintings were done by different publishers in different year.
For each unique value of 'book' in the items
SFrame, select one of the of the available values for author
, publisher
, and year
. Hint: Try using SFrame.groupby
and gl.aggregate.SELECT_ONE
.
In [ ]:
Computing the number of times each book was rated, and add a column containing these counts to the items
SFrame using SFrame.join
.
In [ ]:
Print the first few books, sorted by the number of times they have been rated. Do these values make sense?
In [ ]:
Now print the most similar items per item, sorted by the most common books. Hint: Join the two SFrames you created above.
In [ ]:
Create a dataset called implicit
that contains only ratings data where rating
was 4 or greater.
In [ ]:
Create a train/test split of the implicit
data created above. Hint: Use random_split_by_user.
In [ ]:
Print the first 5 rows of the training set.
In [ ]:
Create a ranking_factorization_recommender
model using just the training set and 20 factors.
In [ ]:
Evaluate how well this model recommends items that were seen in the test set you created above. Hint: Check out m.evaluate_precision_recall()
.
In [ ]:
Create an SFrame containing only one observation, where 'Billy Bob' has rated 'Animal Farm' with score 5.0.
In [ ]:
Use this data when querying for recommendations for the user 'Billy Bob'.
In [ ]: