In [25]:

    
import graphlab as gl
gl.canvas.set_target("ipynb")



In [26]:

    
implicit = gl.SFrame('implicit')
explicit = gl.SFrame('explicit')
items = gl.SFrame('items')
ratings = gl.SFrame('ratings')



In [5]:

    
ratings.show()

Split the data into a training set and a test set

This allows us to evaluate generalization ability.



In [27]:

    
train, valid = gl.recommender.util.random_split_by_user(implicit)

Feature engineering

Compute the number of times each item has been rated.



In [28]:

    
num_ratings_per_item = train.groupby('item_id', {'num_users': gl.aggregate.COUNT})
items = items.join(num_ratings_per_item, on='item_id')

Transform the count into a categorical variable using the feature_engineering module.



In [29]:

    
binner = gl.feature_engineering.FeatureBinner(features=['num_users'], strategy='logarithmic', num_bins=5)
items = binner.fit_transform(items)

Convert each genre element into a dictionary and each year to an integer.



In [30]:

    
items['genres'] = items['genres'].apply(lambda x: {k:1 for k in x})
items['year'] = items['year'].astype(int)



In [31]:

    
items









    Out[31]:





    
        item_id
        genres
        title
        year
        num_users
    
    
        1
        {"Children's": 1,
'Comedy': 1, 'Animati ...
        Toy Story
        1995
        num_users_4
    
    
        2
        {"Children's": 1,
'Adventure': 1, ...
        Jumanji
        1995
        num_users_3
    
    
        3
        {'Romance': 1, 'Comedy':
1} ...
        Grumpier Old Men
        1995
        num_users_3
    
    
        4
        {'Drama': 1, 'Comedy': 1}
        Waiting to Exhale
        1995
        num_users_2
    
    
        5
        {'Comedy': 1}
        Father of the Bride Part
II ...
        1995
        num_users_2
    
    
        6
        {'Action': 1, 'Thriller':
1, 'Crime': 1} ...
        Heat
        1995
        num_users_3
    
    
        7
        {'Romance': 1, 'Comedy':
1} ...
        Sabrina
        1995
        num_users_3
    
    
        8
        {"Children's": 1,
'Adventure': 1} ...
        Tom and Huck
        1995
        num_users_2
    
    
        9
        {'Action': 1}
        Sudden Death
        1995
        num_users_2
    
    
        10
        {'Action': 1,
'Adventure': 1, ...
        GoldenEye
        1995
        num_users_3
    

[3529 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Train models

Collaborative filtering approach that uses the Jaccard similarity of two users' item lists



In [32]:

    
m0 = gl.item_similarity_recommender.create(train)









    




Recsys training: model = item_similarity






    




Warning: Column 'score' ignored.






    




    To use this column as the target, set target = "score" and use a method that allows the use of a target.






    




Preparing data set.






    




    Data has 556371 observations with 6038 users and 3529 items.






    




    Data prepared in: 0.489734s






    




Computing item similarity statistics:






    




Computing most similar items for 3529 items:






    




+-----------------+-----------------+






    




| Number of items | Elapsed Time    |






    




+-----------------+-----------------+






    




| 1000            | 0.80228         |






    




| 2000            | 0.885286        |






    




| 3000            | 0.969132        |






    




+-----------------+-----------------+






    




Finished training in 1.17977s

Collaborative filtering approach that learns latent factors for each user and each item



In [33]:

    
m1 = gl.ranking_factorization_recommender.create(train, max_iterations=10)









    




Recsys training: model = ranking_factorization_recommender






    




Preparing data set.






    




    Data has 556371 observations with 6038 users and 3529 items.






    




    Data prepared in: 0.784596s






    




Training ranking_factorization_recommender for recommendations.






    




+--------------------------------+--------------------------------------------------+----------+






    




| Parameter                      | Description                                      | Value    |






    




+--------------------------------+--------------------------------------------------+----------+






    




| num_factors                    | Factor Dimension                                 | 32       |






    




| regularization                 | L2 Regularization on Factors                     | 1e-09    |






    




| solver                         | Solver used for training                         | adagrad  |






    




| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |






    




| binary_target                  | Assume Binary Targets                            | True     |






    




| max_iterations                 | Maximum Number of Iterations                     | 10       |






    




+--------------------------------+--------------------------------------------------+----------+






    




  Optimizing model using SGD; tuning step size.






    




  Using 69546 / 556371 points for tuning the step size.






    




+---------+-------------------+------------------------------------------+






    




| Attempt | Initial Step Size | Estimated Objective Value                |






    




+---------+-------------------+------------------------------------------+






    




| 0       | 16.6667           | Not Viable                               |






    




| 1       | 4.16667           | Not Viable                               |






    




| 2       | 1.04167           | Not Viable                               |






    




| 3       | 0.260417          | Not Viable                               |






    




| 4       | 0.0651042         | No Decrease (1.47043 >= 1.38645)         |






    




| 5       | 0.016276          | 1.34543                                  |






    




| 6       | 0.00813802        | 1.35577                                  |






    




| 7       | 0.00406901        | 1.3659                                   |






    




| 8       | 0.00203451        | 1.37251                                  |






    




+---------+-------------------+------------------------------------------+






    




| Final   | 0.016276          | 1.34543                                  |






    




+---------+-------------------+------------------------------------------+






    




Starting Optimization.






    




+---------+--------------+-------------------+-----------------------------------+-------------+






    




| Iter.   | Elapsed Time | Approx. Objective | Approx. Training Predictive Error | Step Size   |






    




+---------+--------------+-------------------+-----------------------------------+-------------+






    




| Initial | 112us        | 1.38645           | 0.693158                          |             |






    




+---------+--------------+-------------------+-----------------------------------+-------------+






    




| 1       | 1.21s        | 1.33709           | 0.652715                          | 0.016276    |






    




| 2       | 2.58s        | 1.30773           | 0.643739                          | 0.016276    |






    




| 3       | 3.95s        | 1.29445           | 0.641196                          | 0.016276    |






    




| 4       | 5.29s        | 1.28572           | 0.639083                          | 0.016276    |






    




| 5       | 6.51s        | 1.2805            | 0.636927                          | 0.016276    |






    




| 6       | 7.69s        | 1.27567           | 0.635731                          | 0.016276    |






    




| 7       | 8.95s        | 1.27214           | 0.634294                          | 0.016276    |






    




| 8       | 10.13s       | 1.26873           | 0.633182                          | 0.016276    |






    




| 9       | 11.33s       | 1.26672           | 0.632232                          | 0.016276    |






    




| 10      | 12.94s       | 1.26386           | 0.631565                          | 0.016276    |






    




+---------+--------------+-------------------+-----------------------------------+-------------+






    




Optimization Complete: Maximum number of passes through the data reached.






    




Computing final objective value and training Predictive Error.






    




       Final objective value: 1.27025






    




       Final training Predictive Error: 0.62752

Collaborative filtering approach that learns latent factors for users, items, and side data



In [34]:

    
m2 = gl.ranking_factorization_recommender.create(train, 
                                                 item_data=items[['item_id', 'year']], 
                                                 max_iterations=10)









    




Recsys training: model = ranking_factorization_recommender






    




Preparing data set.






    




    Data has 556371 observations with 6038 users and 3529 items.






    




    Data prepared in: 0.757925s






    




Training ranking_factorization_recommender for recommendations.






    




+--------------------------------+--------------------------------------------------+----------+






    




| Parameter                      | Description                                      | Value    |






    




+--------------------------------+--------------------------------------------------+----------+






    




| num_factors                    | Factor Dimension                                 | 32       |






    




| regularization                 | L2 Regularization on Factors                     | 1e-09    |






    




| solver                         | Solver used for training                         | adagrad  |






    




| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |






    




| binary_target                  | Assume Binary Targets                            | True     |






    




| side_data_factorization        | Assign Factors for Side Data                     | True     |






    




| max_iterations                 | Maximum Number of Iterations                     | 10       |






    




+--------------------------------+--------------------------------------------------+----------+






    




  Optimizing model using SGD; tuning step size.






    




  Using 69546 / 556371 points for tuning the step size.






    




+---------+-------------------+------------------------------------------+






    




| Attempt | Initial Step Size | Estimated Objective Value                |






    




+---------+-------------------+------------------------------------------+






    




| 0       | 12.5              | Not Viable                               |






    




| 1       | 3.125             | Not Viable                               |






    




| 2       | 0.78125           | Not Viable                               |






    




| 3       | 0.195312          | Not Viable                               |






    




| 4       | 0.0488281         | No Decrease (2.00723 >= 1.38643)         |






    




| 5       | 0.012207          | No Decrease (1.70097 >= 1.38643)         |






    




| 6       | 0.00305176        | No Decrease (1.4783 >= 1.38643)          |






    




| 7       | 0.000762939       | No Decrease (1.38799 >= 1.38643)         |






    




| 8       | 0.000190735       | 1.38582                                  |






    




| 9       | 9.53674e-05       | 1.38597                                  |






    




| 10      | 4.76837e-05       | 1.38613                                  |






    




| 11      | 2.38419e-05       | 1.38622                                  |






    




+---------+-------------------+------------------------------------------+






    




| Final   | 0.000190735       | 1.38582                                  |






    




+---------+-------------------+------------------------------------------+






    




Starting Optimization.






    




+---------+--------------+-------------------+-----------------------------------+-------------+






    




| Iter.   | Elapsed Time | Approx. Objective | Approx. Training Predictive Error | Step Size   |






    




+---------+--------------+-------------------+-----------------------------------+-------------+






    




| Initial | 85us         | 1.38643           | 0.693139                          |             |






    




+---------+--------------+-------------------+-----------------------------------+-------------+






    




| 1       | 1.54s        | 1.38538           | 0.691463                          | 0.000190735 |






    




| 2       | 3.09s        | 1.38529           | 0.689766                          | 0.000190735 |






    




| 3       | 4.64s        | 1.3855            | 0.688442                          | 0.000190735 |






    




| 4       | 6.17s        | 1.38603           | 0.687318                          | 0.000190735 |






    




| 5       | 7.68s        | 1.38688           | 0.686364                          | 0.000190735 |






    




| 6       | 9.21s        | 1.38799           | 0.685558                          | 0.000190735 |






    




| 7       | 10.74s       | 1.38946           | 0.684931                          | 0.000190735 |






    




| 8       | 12.60s       | 1.39114           | 0.684416                          | 0.000190735 |






    




| 9       | 14.37s       | 1.39332           | 0.684127                          | 0.000190735 |






    




| 10      | 16.60s       | 1.39561           | 0.683958                          | 0.000190735 |






    




+---------+--------------+-------------------+-----------------------------------+-------------+






    




Optimization Complete: Maximum number of passes through the data reached.






    




Computing final objective value and training Predictive Error.






    




       Final objective value: 1.39739






    




       Final training Predictive Error: 0.683917



In [35]:

    
m3 = gl.ranking_factorization_recommender.create(train, 
                                                 item_data=items[['item_id', 'year', 'genres']], 
                                                 max_iterations=10)









    




Recsys training: model = ranking_factorization_recommender






    




Preparing data set.






    




    Data has 556371 observations with 6038 users and 3529 items.






    




    Data prepared in: 0.619754s






    




Training ranking_factorization_recommender for recommendations.






    




+--------------------------------+--------------------------------------------------+----------+






    




| Parameter                      | Description                                      | Value    |






    




+--------------------------------+--------------------------------------------------+----------+






    




| num_factors                    | Factor Dimension                                 | 32       |






    




| regularization                 | L2 Regularization on Factors                     | 1e-09    |






    




| solver                         | Solver used for training                         | adagrad  |






    




| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |






    




| binary_target                  | Assume Binary Targets                            | True     |






    




| side_data_factorization        | Assign Factors for Side Data                     | True     |






    




| max_iterations                 | Maximum Number of Iterations                     | 10       |






    




+--------------------------------+--------------------------------------------------+----------+






    




  Optimizing model using SGD; tuning step size.






    




  Using 69546 / 556371 points for tuning the step size.






    




+---------+-------------------+------------------------------------------+






    




| Attempt | Initial Step Size | Estimated Objective Value                |






    




+---------+-------------------+------------------------------------------+






    




| 0       | 10                | Not Viable                               |






    




| 1       | 2.5               | Not Viable                               |






    




| 2       | 0.625             | Not Viable                               |






    




| 3       | 0.15625           | Not Viable                               |






    




| 4       | 0.0390625         | No Decrease (1.70989 >= 1.38659)         |






    




| 5       | 0.00976562        | No Decrease (1.86695 >= 1.38659)         |






    




| 6       | 0.00244141        | No Decrease (1.42815 >= 1.38659)         |






    




| 7       | 0.000610352       | No Decrease (1.39472 >= 1.38659)         |






    




| 8       | 0.000152588       | 1.38591                                  |






    




| 9       | 7.62939e-05       | 1.38605                                  |






    




| 10      | 3.8147e-05        | 1.38615                                  |






    




| 11      | 1.90735e-05       | 1.38623                                  |






    




+---------+-------------------+------------------------------------------+






    




| Final   | 0.000152588       | 1.38591                                  |






    




+---------+-------------------+------------------------------------------+






    




Starting Optimization.






    




+---------+--------------+-------------------+-----------------------------------+-------------+






    




| Iter.   | Elapsed Time | Approx. Objective | Approx. Training Predictive Error | Step Size   |






    




+---------+--------------+-------------------+-----------------------------------+-------------+






    




| Initial | 109us        | 1.38659           | 0.693033                          |             |






    




+---------+--------------+-------------------+-----------------------------------+-------------+






    




| 1       | 2.03s        | 1.38588           | 0.688326                          | 0.000152588 |






    




| 2       | 4.03s        | 1.38594           | 0.686816                          | 0.000152588 |






    




| 3       | 6.01s        | 1.38709           | 0.685309                          | 0.000152588 |






    




| 4       | 7.99s        | 1.38863           | 0.684032                          | 0.000152588 |






    




| 5       | 9.94s        | 1.39058           | 0.682958                          | 0.000152588 |






    




| 6       | 11.92s       | 1.39261           | 0.682088                          | 0.000152588 |






    




| 7       | 13.90s       | 1.3949            | 0.681394                          | 0.000152588 |






    




| 8       | 16.67s       | 1.39736           | 0.680825                          | 0.000152588 |






    




| 9       | 19.45s       | 1.40008           | 0.680407                          | 0.000152588 |






    




| 10      | 22.26s       | 1.40275           | 0.680151                          | 0.000152588 |






    




+---------+--------------+-------------------+-----------------------------------+-------------+






    




Optimization Complete: Maximum number of passes through the data reached.






    




Computing final objective value and training Predictive Error.






    




       Final objective value: 1.40473






    




       Final training Predictive Error: 0.680026

Train a recommender that leverages the similarity between items

Create a nearest neighbor model that uses the genres in common and the year of the movie.



In [36]:

    
dist = [[['genres'], 'jaccard', 1.0], 
        [['year'], 'euclidean', 1.0]]
nn_model = gl.nearest_neighbors.create(items, 'item_id', distance=dist)









    




Starting brute force nearest neighbors model training.






    



Defaulting to brute force instead of ball tree because there are multiple distance components.



In [37]:

    
gl.nearest_neighbors.create?

Compute a nearest neighbor graph.



In [38]:

    
similar = nn_model.query(items, 'item_id', k=100)\
             .rename({'query_label': 'item_id', 'reference_label': 'similar', 'distance': 'score'})\
             .join(items[['item_id', 'title']], on='item_id')\
             .join(items[['item_id', 'title']], on={'similar': 'item_id'})
similar['score'] = 1 - similar['score']
similar.print_rows(100, max_row_width=200)









    




Starting pairwise querying.






    




+--------------+---------+-------------+--------------+






    




| Query points | # Pairs | % Complete. | Elapsed Time |






    




+--------------+---------+-------------+--------------+






    




| 1            | 3529    | 0.0283366   | 6.721ms      |






    




| Done         |         | 100         | 688.176ms    |






    




+--------------+---------+-------------+--------------+






    



+---------+---------+----------------+------+-----------+--------------------------------+
| item_id | similar |     score      | rank |   title   |            title.1             |
+---------+---------+----------------+------+-----------+--------------------------------+
|    1    |    1    |      1.0       |  1   | Toy Story |           Toy Story            |
|    1    |   239   |      0.75      |  2   | Toy Story |         Goofy Movie, A         |
|    1    |    13   | 0.666666666667 |  3   | Toy Story |             Balto              |
|    1    |    54   | 0.666666666667 |  4   | Toy Story |         Big Green, The         |
|    1    |   888   | 0.666666666667 |  5   | Toy Story | Land Before Time III: The ...  |
|    1    |    34   |      0.5       |  6   | Toy Story |              Babe              |
|    1    |   745   |      0.5       |  7   | Toy Story |         Close Shave, A         |
|    1    |    48   |      0.4       |  8   | Toy Story |           Pocahontas           |
|    1    |    5    | 0.333333333333 |  9   | Toy Story |  Father of the Bride Part II   |
|    1    |    19   | 0.333333333333 |  10  | Toy Story | Ace Ventura: When Nature Calls |
|    1    |    38   | 0.333333333333 |  11  | Toy Story |          It Takes Two          |
|    1    |    52   | 0.333333333333 |  12  | Toy Story |        Mighty Aphrodite        |
|    1    |    69   | 0.333333333333 |  13  | Toy Story |             Friday             |
|    1    |    96   | 0.333333333333 |  14  | Toy Story |     In the Bleak Midwinter     |
|    1    |   119   | 0.333333333333 |  15  | Toy Story |    Steal Big, Steal Little     |
|    1    |   144   | 0.333333333333 |  16  | Toy Story |     Brothers McMullen, The     |
|    1    |   156   | 0.333333333333 |  17  | Toy Story |        Blue in the Face        |
|    1    |   171   | 0.333333333333 |  18  | Toy Story |            Jeffrey             |
|    1    |   174   | 0.333333333333 |  19  | Toy Story |           Jury Duty            |
|    1    |   176   | 0.333333333333 |  20  | Toy Story |       Living in Oblivion       |
|    1    |   180   | 0.333333333333 |  21  | Toy Story |            Mallrats            |
|    1    |   186   | 0.333333333333 |  22  | Toy Story |          Nine Months           |
|    1    |   187   | 0.333333333333 |  23  | Toy Story |           Party Girl           |
|    1    |   189   | 0.333333333333 |  24  | Toy Story |            Reckless            |
|    1    |   203   | 0.333333333333 |  25  | Toy Story | To Wong Foo, Thanks for Ev...  |
|    1    |   212   | 0.333333333333 |  26  | Toy Story |          Bushwhacked           |
|    1    |   216   | 0.333333333333 |  27  | Toy Story |         Billy Madison          |
|    1    |   228   | 0.333333333333 |  28  | Toy Story |   Destiny Turns on the Radio   |
|    1    |   258   | 0.333333333333 |  29  | Toy Story | Kid in King Arthur's Court, A  |
|    1    |   274   | 0.333333333333 |  30  | Toy Story |        Man of the House        |
|    1    |   278   | 0.333333333333 |  31  | Toy Story |         Miami Rhapsody         |
|    1    |   310   | 0.333333333333 |  32  | Toy Story |           Rent-a-Kid           |
|    1    |   312   | 0.333333333333 |  33  | Toy Story |    Stuart Saves His Family     |
|    1    |   325   | 0.333333333333 |  34  | Toy Story | National Lampoon's Senior Trip |
|    1    |   333   | 0.333333333333 |  35  | Toy Story |           Tommy Boy            |
|    1    |   343   | 0.333333333333 |  36  | Toy Story |     Baby-Sitters Club, The     |
|    1    |   467   | 0.333333333333 |  37  | Toy Story |        Live Nude Girls         |
|    1    |   585   | 0.333333333333 |  38  | Toy Story |     Brady Bunch Movie, The     |
|    1    |   603   | 0.333333333333 |  39  | Toy Story |         Bye Bye, Love          |
|    1    |   633   | 0.333333333333 |  40  | Toy Story |        Denise Calls Up         |
|    1    |   700   | 0.333333333333 |  41  | Toy Story |             Angus              |
|    1    |   717   | 0.333333333333 |  42  | Toy Story |  Mouth to Mouth (Boca a boca)  |
|    1    |   728   | 0.333333333333 |  43  | Toy Story |       Cold Comfort Farm        |
|    1    |   3446  | 0.333333333333 |  44  | Toy Story |          Funny Bones           |
|    1    |    3    |      0.25      |  45  | Toy Story |        Grumpier Old Men        |
|    1    |    4    |      0.25      |  46  | Toy Story |       Waiting to Exhale        |
|    1    |    7    |      0.25      |  47  | Toy Story |            Sabrina             |
|    1    |    8    |      0.25      |  48  | Toy Story |          Tom and Huck          |
|    1    |    12   |      0.25      |  49  | Toy Story |  Dracula: Dead and Loving It   |
|    1    |    39   |      0.25      |  50  | Toy Story |            Clueless            |
|    1    |    45   |      0.25      |  51  | Toy Story |           To Die For           |
|    1    |    68   |      0.25      |  52  | Toy Story |  French Twist (Gazon maudit)   |
|    1    |    72   |      0.25      |  53  | Toy Story |     Kicking and Screaming      |
|    1    |    84   |      0.25      |  54  | Toy Story |  Last Summer in the Hamptons   |
|    1    |    93   |      0.25      |  55  | Toy Story |      Vampire in Brooklyn       |
|    1    |   129   |      0.25      |  56  | Toy Story |         Pie in the Sky         |
|    1    |   146   |      0.25      |  57  | Toy Story |  Amazing Panda Adventure, The  |
|    1    |   158   |      0.25      |  58  | Toy Story |             Casper             |
|    1    |   166   |      0.25      |  59  | Toy Story |      Doom Generation, The      |
|    1    |   181   |      0.25      |  60  | Toy Story | Mighty Morphin Power Range...  |
|    1    |   205   |      0.25      |  61  | Toy Story |        Unstrung Heroes         |
|    1    |   218   |      0.25      |  62  | Toy Story |        Boys on the Side        |
|    1    |   236   |      0.25      |  63  | Toy Story |          French Kiss           |
|    1    |   237   |      0.25      |  64  | Toy Story |          Forget Paris          |
|    1    |   238   |      0.25      |  65  | Toy Story | Far From Home: The Adventu...  |
|    1    |   241   |      0.25      |  66  | Toy Story |             Fluke              |
|    1    |   262   |      0.25      |  67  | Toy Story |       Little Princess, A       |
|    1    |   294   |      0.25      |  68  | Toy Story |       Perez Family, The        |
|    1    |   295   |      0.25      |  69  | Toy Story |   Pyromaniac's Love Story, A   |
|    1    |   304   |      0.25      |  70  | Toy Story |           Roommates            |
|    1    |   322   |      0.25      |  71  | Toy Story |      Swimming with Sharks      |
|    1    |   330   |      0.25      |  72  | Toy Story |      Tales from the Hood       |
|    1    |   339   |      0.25      |  73  | Toy Story |    While You Were Sleeping     |
|    1    |   468   |      0.25      |  74  | Toy Story | Englishman Who Went Up a H...  |
|    1    |   562   |      0.25      |  75  | Toy Story |    Welcome to the Dollhouse    |
|    1    |   741   |      0.25      |  76  | Toy Story | Ghost in the Shell (Kokaku...  |
|    1    |   753   |      0.25      |  77  | Toy Story |      Month by the Lake, A      |
|    1    |   754   |      0.25      |  78  | Toy Story | Gold Diggers: The Secret o...  |
|    1    |   807   |      0.25      |  79  | Toy Story | Rendezvous in Paris (Rende...  |
|    1    |   864   |      0.25      |  80  | Toy Story |           Wife, The            |
|    1    |   3046  |      0.25      |  81  | Toy Story | Incredibly True Adventure ...  |
|    1    |   3477  |      0.25      |  82  | Toy Story |         Empire Records         |
|    1    |    2    |      0.2       |  83  | Toy Story |            Jumanji             |
|    1    |    11   |      0.2       |  84  | Toy Story |    American President, The     |
|    1    |    21   |      0.2       |  85  | Toy Story |           Get Shorty           |
|    1    |    56   |      0.2       |  86  | Toy Story |    Kids of the Round Table     |
|    1    |    60   |      0.2       |  87  | Toy Story |  Indian in the Cupboard, The   |
|    1    |   169   |      0.2       |  88  | Toy Story | Free Willy 2: The Adventur...  |
|    1    |   195   |      0.2       |  89  | Toy Story |    Something to Talk About     |
|    1    |   224   |      0.2       |  90  | Toy Story |        Don Juan DeMarco        |
|    1    |   2483  |      0.2       |  91  | Toy Story | Day of the Beast, The (El ...  |
|    1    |   153   | 0.166666666667 |  92  | Toy Story |         Batman Forever         |
|    1    |   327   | 0.166666666667 |  93  | Toy Story |           Tank Girl            |
|    1    |   688   | 0.166666666667 |  94  | Toy Story |      Operation Dumbo Drop      |
|    1    |    6    |      0.0       |  95  | Toy Story |              Heat              |
|    1    |    9    |      0.0       |  96  | Toy Story |          Sudden Death          |
|    1    |    10   |      0.0       |  97  | Toy Story |           GoldenEye            |
|    1    |    14   |      0.0       |  98  | Toy Story |             Nixon              |
|    1    |    15   |      0.0       |  99  | Toy Story |        Cutthroat Island        |
|    1    |    16   |      0.0       | 100  | Toy Story |             Casino             |
+---------+---------+----------------+------+-----------+--------------------------------+
[352900 rows x 6 columns]

Use this similarity data as the basis for a recommender.



In [39]:

    
m5 = gl.item_similarity_recommender.create(train, nearest_items=similar)









    




Recsys training: model = item_similarity






    




Warning: Column 'score' ignored.






    




    To use this column as the target, set target = "score" and use a method that allows the use of a target.






    




Preparing data set.






    




    Loading user-provided nearest items.






    




    Data has 556371 observations with 6038 users and 3529 items.






    




    Data prepared in: 0.91846s

Evaluation

Create a precision/recall plot to compare the recommendation quality of the above models given our heldout data.



In [40]:

    
model_comparison = gl.compare(valid, [m0, m1, m2, m3, m5], user_sample=.3)









    



compare_models: using 297 users to estimate model performance
PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.340067340067 | 0.0273701812558 |
|   2    | 0.308080808081 | 0.0478083726971 |
|   3    | 0.288439955107 | 0.0644063022978 |
|   4    | 0.273569023569 | 0.0837581789951 |
|   5    | 0.259259259259 |  0.097804796748 |
|   6    | 0.246913580247 |  0.110896121437 |
|   7    | 0.239057239057 |  0.120171306579 |
|   8    | 0.231902356902 |  0.133021390364 |
|   9    | 0.21922933034  |  0.140607202562 |
|   10   | 0.211111111111 |  0.150910548487 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.208754208754 | 0.0196167645986 |
|   2    | 0.185185185185 | 0.0325496617873 |
|   3    | 0.179573512907 | 0.0423502309465 |
|   4    | 0.172558922559 | 0.0516731283008 |
|   5    | 0.165656565657 | 0.0626777457678 |
|   6    | 0.156565656566 | 0.0708693455856 |
|   7    | 0.151996151996 | 0.0777337348093 |
|   8    | 0.144781144781 | 0.0849421423653 |
|   9    | 0.140665918444 | 0.0912976245018 |
|   10   | 0.135353535354 | 0.0963525147845 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M2

Precision and recall summary statistics by cutoff
+--------+-----------------+------------------+
| cutoff |  mean_precision |   mean_recall    |
+--------+-----------------+------------------+
|   1    |  0.10101010101  | 0.00630900298195 |
|   2    | 0.0942760942761 | 0.0107194435737  |
|   3    |  0.107744107744 | 0.0214553505285  |
|   4    |  0.106902356902 | 0.0282813326662  |
|   5    |  0.104377104377 | 0.0371780264877  |
|   6    |  0.104938271605 | 0.0455064168293  |
|   7    |  0.101491101491 | 0.0499579851595  |
|   8    |  0.101430976431 | 0.0556392248383  |
|   9    |  0.104377104377 | 0.0633802772197  |
|   10   |  0.103703703704 | 0.0683569645247  |
+--------+-----------------+------------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M3

Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    |  0.144781144781 | 0.0120901537519 |
|   2    |  0.116161616162 | 0.0187377280862 |
|   3    | 0.0976430976431 | 0.0222244648549 |
|   4    | 0.0993265993266 | 0.0295933844723 |
|   5    | 0.0976430976431 | 0.0386882538229 |
|   6    | 0.0925925925926 | 0.0426572190333 |
|   7    | 0.0899470899471 | 0.0483395580037 |
|   8    |  0.087962962963 | 0.0517209152979 |
|   9    | 0.0845491956603 | 0.0557697749298 |
|   10   | 0.0814814814815 | 0.0601898874811 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M4

Precision and recall summary statistics by cutoff
+--------+-----------------+------------------+
| cutoff |  mean_precision |   mean_recall    |
+--------+-----------------+------------------+
|   1    |  0.013468013468 | 0.00107730896088 |
|   2    | 0.0117845117845 | 0.00128640444201 |
|   3    |  0.013468013468 | 0.00318878244088 |
|   4    |  0.013468013468 | 0.0051776828255  |
|   5    | 0.0127946127946 | 0.0055468443961  |
|   6    |  0.013468013468 | 0.00650871005558 |
|   7    |  0.013468013468 | 0.00751352734827 |
|   8    | 0.0130471380471 | 0.00891898234022 |
|   9    |  0.013468013468 | 0.00981078819657 |
|   10   |  0.013468013468 | 0.0112822641671  |
+--------+-----------------+------------------+
[10 rows x 3 columns]

Model compare metric: precision_recall



In [24]:

    
gl.show_comparison(model_comparison, [m0, m1, m2, m3, m5])



In [ ]:

item_id	genres	title	year	num_users
1	{"Children's": 1, 'Comedy': 1, 'Animati ...	Toy Story	1995	num_users_4
2	{"Children's": 1, 'Adventure': 1, ...	Jumanji	1995	num_users_3
3	{'Romance': 1, 'Comedy': 1} ...	Grumpier Old Men	1995	num_users_3
4	{'Drama': 1, 'Comedy': 1}	Waiting to Exhale	1995	num_users_2
5	{'Comedy': 1}	Father of the Bride Part II ...	1995	num_users_2
6	{'Action': 1, 'Thriller': 1, 'Crime': 1} ...	Heat	1995	num_users_3
7	{'Romance': 1, 'Comedy': 1} ...	Sabrina	1995	num_users_3
8	{"Children's": 1, 'Adventure': 1} ...	Tom and Huck	1995	num_users_2
9	{'Action': 1}	Sudden Death	1995	num_users_2
10	{'Action': 1, 'Adventure': 1, ...	GoldenEye	1995	num_users_3