In [1]:
import graphlab as gl

The following code snippet will parse the books data provided at the training.


In [2]:
import os
if os.path.exists('books/ratings'):
    ratings = gl.SFrame('books/ratings')
    items = gl.SFrame('books/items')
    users = gl.SFrame('books/users')
else:
    ratings = gl.SFrame.read_csv('books/book-ratings.csv')
    ratings.save('books/ratings')
    items = gl.SFrame.read_csv('books/book-data.csv')
    items.save('books/items')
    users = gl.SFrame.read_csv('books/user-data.csv')
    users.save('books/users')


[INFO] This commercial license of GraphLab Create is assigned to engr@turi.com.

[INFO] Start server at: ipc:///tmp/graphlab_server-41686 - Server binary: /Users/chris/miniconda/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1443482376.log
[INFO] GraphLab Server Version: 1.6.1

Visually explore the above data using GraphLab Canvas.

Recommendation systems

In this section we will make a model that can be used to recommend new tags to users.

Creating a Model

Use gl.recommender.create() to create a model that can be used to recommend tags to each user.


In [ ]:

Print a summary of the model by simply entering the name of the object.


In [ ]:

Get all unique users from the first 10000 observations and save them as a variable called users.


In [ ]:

Get 20 recommendations for each user in your list of users. Save these as a new SFrame called recs.


In [ ]:

Inspecting your model

Get an SFrame of the 20 most similar items for each observed item.


In [ ]:

This dataset has multiple rows corresponding to the same book, e.g., in situations where reprintings were done by different publishers in different year.

For each unique value of 'book' in the items SFrame, select one of the of the available values for author, publisher, and year. Hint: Try using SFrame.groupby and gl.aggregate.SELECT_ONE.


In [ ]:

Computing the number of times each book was rated, and add a column containing these counts to the items SFrame using SFrame.join.


In [ ]:

Print the first few books, sorted by the number of times they have been rated. Do these values make sense?


In [ ]:

Now print the most similar items per item, sorted by the most common books. Hint: Join the two SFrames you created above.


In [ ]:

Experimenting with other models

Create a dataset called implicit that contains only ratings data where rating was 4 or greater.


In [ ]:

Create a train/test split of the implicit data created above. Hint: Use random_split_by_user.


In [ ]:

Print the first 5 rows of the training set.


In [ ]:

Create a ranking_factorization_recommender model using just the training set and 20 factors.


In [ ]:

Evaluate how well this model recommends items that were seen in the test set you created above. Hint: Check out m.evaluate_precision_recall().


In [ ]:

Create an SFrame containing only one observation, where 'Billy Bob' has rated 'Animal Farm' with score 5.0.


In [ ]:

Use this data when querying for recommendations for the user 'Billy Bob'.


In [ ]: