The Five-Line Recommender, Explained

Building a recommender system is easy with GraphLab Create. Simply import graphlab, load data, create a recommender model, and start making recommendations. Let's walk through this line by line.

Step 1: Import GraphLab into Python



In [1]:

    
import graphlab

Step 2: Load the dataset

The data is sitting on an AWS S3 bucket as a csv file. We can load it into a GraphLab SFrame with read_csv(), specifying the "rating" column to be loaded as integers. For other ways of creating an SFrame and doing data munging, see the SFrame tutorial.



In [2]:

    
data = graphlab.SFrame.read_csv("http://s3.amazonaws.com/dato-datasets/movie_ratings/training_data.csv", column_type_hints={"rating":int})
data.head()









    




PROGRESS: Downloading http://s3.amazonaws.com/dato-datasets/movie_ratings/training_data.csv to /var/tmp/graphlab-zach/39733/000000.csv






    




PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/movie_ratings/training_data.csv






    




PROGRESS: Parsing completed. Parsed 100 lines in 0.109583 secs.






    




PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/movie_ratings/training_data.csv






    




PROGRESS: Parsing completed. Parsed 82068 lines in 0.101536 secs.






    Out[2]:





    
        user
        movie
        rating
    
    
        Jacob Smith
        Flirting with Disaster
        4
    
    
        Jacob Smith
        Indecent Proposal
        3
    
    
        Jacob Smith
        Runaway Bride
        2
    
    
        Jacob Smith
        Swiss Family Robinson
        1
    
    
        Jacob Smith
        The Mexican
        2
    
    
        Jacob Smith
        Maid in Manhattan
        4
    
    
        Jacob Smith
        A Charlie Brown
Thanksgiving / The ...
        3
    
    
        Jacob Smith
        Brazil
        1
    
    
        Jacob Smith
        Forrest Gump
        3
    
    
        Jacob Smith
        It Happened One Night
        4
    

[10 rows x 3 columns]

Step 3: Build a model

We have the data. It's time to build a model. Now, have you ever tried typing "item similarity, matrix factorization, factorization machine, popularity" ten times in a row? We have, and we don't recommend it. (Harhar.) There are many good models for making recommendations, but sometimes even knowing the right names can be a challenge, much less typing them time after time.

This is why GraphLab Create provides a default recommender called ... recommender. You can build a default recommender with recommender.create(). It requires a dataset to use for training the model, as well as the names of the columns that contain the user IDs, the item IDs, and the ratings (if present).



In [3]:

    
# Build a default recommender (a Matrix Factorization model)
# The data needs to contain at least three columns: user, item, and rating.
# All other columns in the dataset are ignored by the default recommender.
model = graphlab.recommender.create(data, user_id="user", item_id="movie", target="rating")









    




PROGRESS: Recsys training: model = ranking_factorization_recommender






    




PROGRESS: Preparing data set.






    




PROGRESS:     Data has 82068 observations with 334 users and 7714 items.






    




PROGRESS:     Data prepared in: 0.204213s






    




PROGRESS: Training ranking_factorization_recommender for recommendations.






    




PROGRESS: +--------------------------------+--------------------------------------------------+----------+






    




PROGRESS: | Parameter                      | Description                                      | Value    |






    




PROGRESS: +--------------------------------+--------------------------------------------------+----------+






    




PROGRESS: | num_factors                    | Factor Dimension                                 | 32       |






    




PROGRESS: | regularization                 | L2 Regularization on Factors                     | 1e-09    |






    




PROGRESS: | solver                         | Solver used for training                         | sgd      |






    




PROGRESS: | linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |






    




PROGRESS: | ranking_regularization         | Rank-based Regularization Weight                 | 0.25     |






    




PROGRESS: | max_iterations                 | Maximum Number of Iterations                     | 25       |






    




PROGRESS: +--------------------------------+--------------------------------------------------+----------+






    




PROGRESS:   Optimizing model using SGD; tuning step size.






    




PROGRESS:   Using 10258 / 82068 points for tuning the step size.






    




PROGRESS: +---------+-------------------+------------------------------------------+






    




PROGRESS: | Attempt | Initial Step Size | Estimated Objective Value                |






    




PROGRESS: +---------+-------------------+------------------------------------------+






    




PROGRESS: | 0       | 25                | Not Viable                               |






    




PROGRESS: | 1       | 6.25              | Not Viable                               |






    




PROGRESS: | 2       | 1.5625            | Not Viable                               |






    




PROGRESS: | 3       | 0.390625          | Not Viable                               |






    




PROGRESS: | 4       | 0.0976562         | 1.61865                                  |






    




PROGRESS: | 5       | 0.0488281         | 1.66185                                  |






    




PROGRESS: | 6       | 0.0244141         | 1.72837                                  |






    




PROGRESS: | 7       | 0.012207          | 1.80785                                  |






    




PROGRESS: +---------+-------------------+------------------------------------------+






    




PROGRESS: | Final   | 0.0976562         | 1.61865                                  |






    




PROGRESS: +---------+-------------------+------------------------------------------+






    




PROGRESS: Starting Optimization.






    




PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+






    




PROGRESS: | Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |






    




PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+






    




PROGRESS: | Initial | 339us        | 2.40069           | 1.10654               |             |






    




PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+






    




PROGRESS: | 1       | 121.209ms    | 2.01682           | 1.13516               | 0.0976562   |






    




PROGRESS: | 2       | 236.228ms    | 1.76884           | 1.06319               | 0.0580668   |






    




PROGRESS: | 3       | 344.786ms    | 1.55833           | 0.983183              | 0.042841    |






    




PROGRESS: | 4       | 455.018ms    | 1.36545           | 0.906899              | 0.0345267   |






    




PROGRESS: | 5       | 558.047ms    | 1.19205           | 0.832933              | 0.029206    |






    




PROGRESS: | 6       | 688.083ms    | 1.04541           | 0.765651              | 0.0254734   |






    




PROGRESS: | 10      | 1.14s        | 0.718906          | 0.605205              | 0.017366    |






    




PROGRESS: | 11      | 1.23s        | 0.674778          | 0.583149              | 0.016168    |






    




PROGRESS: | 20      | 2.22s        | 0.518888          | 0.497694              | 0.0103259   |






    




PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+






    




PROGRESS: Optimization Complete: Maximum number of passes through the data reached.






    




PROGRESS: Computing final objective value and training RMSE.






    




PROGRESS:        Final objective value: 0.448549






    




PROGRESS:        Final training RMSE: 0.417582

Details (and the small devil therein):

Under the hood, the type of recommender is chosen based on the provided data and whether the desired task is ranking (default) or rating prediction. The default recommender for this type of data and the default ranking task is a matrix factorization model, implemented on top of the disk-backed SFrame data structure. The default solver is stochastic gradient descent, and the recommender model used is the RankingFactorizationModel , which balances rating prediction with a ranking objective. The default create() function does not allow changes to the default parameters of a specific model, but it is just as easy to build a specific recommender with your own parameters using the appropriate model-specific create() function.

Step 4: Make recommendations

The trained model can now make recommendations of new items for users. To do so, call recommend() with an SArray of user ids. If users is set to None, then recommend() will make recommendations for all the users seen during training, automatically excluding the items that are observed for each user. In other words, if data contains a row "Alice, The Great Gatsby", then recommend() will not recommend "The Great Gatsby" for user "Alice". It will return at most k new items for each user, sorted by their rank. It will return fewer than k items if there are not enough items that the user has not already rated or seen.

The score column of the output contains the unnormalized prediction scores for each user-item pair. The semantic meanings of these scores may differ between models. For the linear regression model, for instance, a higher average score for a user means that the model thinks that this user is generally more enthusiastic than others. See the Basic Recommender Functionalities tutorial for more details.



In [4]:

    
# You can now make recommendations for all the users you've just trained on
results = model.recommend(users=None, k=5)
results.head()









    Out[4]:





    
        user
        movie
        score
        rank
    
    
        Jacob Smith
        Sex and the City: Season
2 ...
        5.11639140642
        1
    
    
        Jacob Smith
        Sex and the City: Season
1 ...
        5.02684734857
        2
    
    
        Jacob Smith
        Sex and the City: Season
6: Part 2 ...
        4.85449193514
        3
    
    
        Jacob Smith
        Sex and the City: Season
3 ...
        4.67698882616
        4
    
    
        Jacob Smith
        Doctor Zhivago
        4.6545002321
        5
    
    
        Mason Smith
        Mulholland Drive
        6.07918380297
        1
    
    
        Mason Smith
        Rushmore
        5.84550069368
        2
    
    
        Mason Smith
        The Sound of Music
        5.73360227144
        3
    
    
        Mason Smith
        Napoleon Dynamite
        5.48369811571
        4
    
    
        Mason Smith
        Six Feet Under: Season 1
        5.38490270174
        5
    

[10 rows x 4 columns]

Step 5: Save the model

The model can be saved for later use, either on the local machine or in an AWS S3 bucket. The saved model sits in its own directory, and can be loaded back in later to make more predictions.



In [5]:

    
# Save the model for later use
model.save("my_model")

Et voilà! You've just built your first recommender with GraphLab Create. Congratulations!

(Looking for more details about the modules and functions? Check out the API docs.)

user	movie	rating
Jacob Smith	Flirting with Disaster	4
Jacob Smith	Indecent Proposal	3
Jacob Smith	Runaway Bride	2
Jacob Smith	Swiss Family Robinson	1
Jacob Smith	The Mexican	2
Jacob Smith	Maid in Manhattan	4
Jacob Smith	A Charlie Brown Thanksgiving / The ...	3
Jacob Smith	Brazil	1
Jacob Smith	Forrest Gump	3
Jacob Smith	It Happened One Night	4

user	movie	score	rank
Jacob Smith	Sex and the City: Season 2 ...	5.11639140642	1
Jacob Smith	Sex and the City: Season 1 ...	5.02684734857	2
Jacob Smith	Sex and the City: Season 6: Part 2 ...	4.85449193514	3
Jacob Smith	Sex and the City: Season 3 ...	4.67698882616	4
Jacob Smith	Doctor Zhivago	4.6545002321	5
Mason Smith	Mulholland Drive	6.07918380297	1
Mason Smith	Rushmore	5.84550069368	2
Mason Smith	The Sound of Music	5.73360227144	3
Mason Smith	Napoleon Dynamite	5.48369811571	4
Mason Smith	Six Feet Under: Season 1	5.38490270174	5