In [1]:
import graphlab
graphlab.canvas.set_target("ipynb")
rating_sf = graphlab.SFrame('ratings')
users = graphlab.SFrame('users')
items = graphlab.SFrame('items')


This non-commercial license of GraphLab Create is assigned to wangchengjun@nju.edu.cn and will expire on July 31, 2016. For commercial licensing options, visit https://dato.com/buy/.
2016-04-14 01:35:40,341 [INFO] graphlab.cython.cy_server, 176: GraphLab Create v1.8.5 started. Logging: /tmp/graphlab_server_1460568932.log

In [2]:
rating_sf.show()



In [4]:
dir(graphlab.recommender)


Out[4]:
['__all__',
 '__builtins__',
 '__doc__',
 '__file__',
 '__name__',
 '__package__',
 '__path__',
 'create',
 'factorization_recommender',
 'item_similarity_recommender',
 'popularity_recommender',
 'ranking_factorization_recommender',
 'util']

In [6]:
(train, test) = graphlab.recommender.util.random_split_by_user(rating_sf, 'user_id', 'movie_id')

In [9]:
from graphlab import item_similarity_recommender
itemcf = item_similarity_recommender.create(train[train['rating'] > 4], 'user_id', 'movie_id')


Recsys training: model = item_similarity
Warning: Ignoring columns rating, timestamp;
    To use one of these as a target column, set target = 
    and use a method that allows the use of a target.
Preparing data set.
    Data has 218621 observations with 6012 users and 3224 items.
    Data prepared in: 0.195331s
Computing item similarity statistics:
Computing most similar items for 3224 items:
+-----------------+-----------------+
| Number of items | Elapsed Time    |
+-----------------+-----------------+
| 1000            | 1.08126         |
| 2000            | 1.10032         |
| 3000            | 1.12497         |
+-----------------+-----------------+
Finished training in 1.20171s

In [11]:
pop    = graphlab.popularity_recommender.create(train[train['rating'] > 4], 'user_id', 'movie_id')


Recsys training: model = popularity
Warning: Ignoring columns rating, timestamp;
    To use one of these as a target column, set target = 
    and use a method that allows the use of a target.
Preparing data set.
    Data has 218621 observations with 6012 users and 3224 items.
    Data prepared in: 0.237904s
218621 observations to process; with 3224 unique items.

In [12]:
m = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating')


Recsys training: model = ranking_factorization_recommender
Preparing data set.
    Data has 965508 observations with 6040 users and 3706 items.
    Data prepared in: 0.907655s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter                      | Description                                      | Value    |
+--------------------------------+--------------------------------------------------+----------+
| num_factors                    | Factor Dimension                                 | 32       |
| regularization                 | L2 Regularization on Factors                     | 1e-09    |
| solver                         | Solver used for training                         | adagrad  |
| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |
| ranking_regularization         | Rank-based Regularization Weight                 | 0.25     |
| max_iterations                 | Maximum Number of Iterations                     | 25       |
+--------------------------------+--------------------------------------------------+----------+
  Optimizing model using SGD; tuning step size.
  Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value                |
+---------+-------------------+------------------------------------------+
| 0       | 16.6667           | Not Viable                               |
| 1       | 4.16667           | Not Viable                               |
| 2       | 1.04167           | Not Viable                               |
| 3       | 0.260417          | 1.80479                                  |
| 4       | 0.130208          | 1.8322                                   |
| 5       | 0.0651042         | 1.8873                                   |
| 6       | 0.0325521         | 1.88706                                  |
+---------+-------------------+------------------------------------------+
| Final   | 0.260417          | 1.80479                                  |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 101us        | 2.4462            | 1.11698               |             |
+---------+--------------+-------------------+-----------------------+-------------+
| 1       | 1.43s        | DIVERGED          | DIVERGED              | 0.260417    |
| RESET   | 1.91s        | 2.44619           | 1.11697               |             |
| 1       | 3.24s        | DIVERGED          | DIVERGED              | 0.130208    |
| RESET   | 3.75s        | 2.44619           | 1.11697               |             |
| 1       | 4.84s        | 2.10443           | 1.14093               | 0.0651042   |
| 2       | 5.81s        | 1.82027           | 1.04353               | 0.0651042   |
| 3       | 6.89s        | 1.75645           | 1.02196               | 0.0651042   |
| 4       | 7.90s        | 1.7206            | 1.01294               | 0.0651042   |
| 5       | 8.99s        | 1.69207           | 1.00488               | 0.0651042   |
| 6       | 10.03s       | 1.66916           | 0.998471              | 0.0651042   |
| 7       | 11.11s       | 1.64975           | 0.992687              | 0.0651042   |
| 8       | 12.27s       | 1.63331           | 0.987803              | 0.0651042   |
| 9       | 13.53s       | 1.6203            | 0.984347              | 0.0651042   |
| 10      | 14.68s       | 1.60869           | 0.981751              | 0.0651042   |
| 11      | 15.79s       | 1.59758           | 0.977906              | 0.0651042   |
| 12      | 16.82s       | 1.58984           | 0.976171              | 0.0651042   |
| 13      | 17.96s       | 1.58036           | 0.973489              | 0.0651042   |
| 14      | 19.24s       | 1.57243           | 0.971477              | 0.0651042   |
| 15      | 20.37s       | 1.56503           | 0.969302              | 0.0651042   |
| 16      | 21.50s       | 1.55807           | 0.967444              | 0.0651042   |
| 17      | 22.63s       | 1.55118           | 0.965764              | 0.0651042   |
| 18      | 23.79s       | 1.54509           | 0.963793              | 0.0651042   |
| 19      | 24.93s       | 1.53942           | 0.961991              | 0.0651042   |
| 20      | 26.23s       | 1.53433           | 0.960398              | 0.0651042   |
| 21      | 27.37s       | 1.52844           | 0.959103              | 0.0651042   |
| 22      | 28.46s       | 1.52382           | 0.958025              | 0.0651042   |
| 23      | 29.55s       | 1.51829           | 0.956181              | 0.0651042   |
| 24      | 30.81s       | 1.51352           | 0.955045              | 0.0651042   |
| 25      | 32.13s       | 1.50902           | 0.953533              | 0.0651042   |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
       Final objective value: 1.53876
       Final training RMSE: 0.948071

In [13]:
m


Out[13]:
Class                           : RankingFactorizationRecommender

Schema
------
User ID                         : user_id
Item ID                         : movie_id
Target                          : rating
Additional observation features : 1
Number of user side features    : 0
Number of item side features    : 0

Statistics
----------
Number of observations          : 965508
Number of users                 : 6040
Number of items                 : 3706

Training summary
----------------
Training time                   : 36.9965

Model Parameters
----------------
Model class                     : RankingFactorizationRecommender
num_factors                     : 32
binary_target                   : 0
side_data_factorization         : 1
solver                          : auto
nmf                             : 0
max_iterations                  : 25

Regularization Settings
-----------------------
regularization                  : 0.0
regularization_type             : normal
linear_regularization           : 0.0
ranking_regularization          : 0.25
unobserved_rating_value         : -1.79769313486e+308
num_sampled_negative_examples   : 4
ials_confidence_scaling_type    : auto
ials_confidence_scaling_factor  : 1

Optimization Settings
---------------------
init_random_sigma               : 0.01
sgd_convergence_interval        : 4
sgd_convergence_threshold       : 0.0
sgd_max_trial_iterations        : 5
sgd_sampling_block_size         : 131072
sgd_step_adjustment_interval    : 4
sgd_step_size                   : 0.0
sgd_trial_sample_minimum_size   : 10000
sgd_trial_sample_proportion     : 0.125
step_size_decrease_rate         : 0.75
additional_iterations_if_unhealthy: 5
adagrad_momentum_weighting      : 0.9
num_tempering_iterations        : 4
tempering_regularization_start_value: 0.0
track_exact_loss                : 0

In [14]:
m['coefficients']


Out[14]:
{'intercept': 3.5821495005738013, 'movie_id': Columns:
 	movie_id	int
 	linear_terms	float
 	factors	array
 
 Rows: 3706
 
 Data:
 +----------+------------------+-------------------------------+
 | movie_id |   linear_terms   |            factors            |
 +----------+------------------+-------------------------------+
 |   1193   |  1.06781125069   | [-0.119829073548, -0.02245... |
 |   661    | -0.0261590108275 | [-0.727257788181, 0.016146... |
 |   914    |  0.324085891247  | [-0.859803378582, 0.056376... |
 |   3408   |  0.565778970718  | [0.334619760513, -0.014206... |
 |   2355   |  0.648248255253  | [-0.248598009348, 0.103843... |
 |   1197   |  1.12024652958   | [-0.100379563868, 0.085359... |
 |   1287   |  0.345532894135  | [-0.247123196721, 0.024613... |
 |   2804   |  0.894821941853  | [-0.272583067417, 0.046351... |
 |   594    |  0.311594575644  | [-0.974369823933, 0.054282... |
 |   919    |  0.97704321146   | [-0.598346889019, 0.085630... |
 +----------+------------------+-------------------------------+
 [3706 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'side_data': Columns:
 	feature	str
 	index	str
 	linear_terms	float
 	factors	array
 
 Rows: 1
 
 Data:
 +-----------+-------+-----------------+-------------------------------+
 |  feature  | index |   linear_terms  |            factors            |
 +-----------+-------+-----------------+-------------------------------+
 | timestamp |   0   | -0.116745471954 | [-0.564183712006, 1.267165... |
 +-----------+-------+-----------------+-------------------------------+
 [1 rows x 4 columns], 'user_id': Columns:
 	user_id	int
 	linear_terms	float
 	factors	array
 
 Rows: 6040
 
 Data:
 +---------+------------------+-------------------------------+
 | user_id |   linear_terms   |            factors            |
 +---------+------------------+-------------------------------+
 |    1    | -0.027785371989  | [-0.0942558199167, 0.00739... |
 |    2    | -0.0234720371664 | [0.015922004357, -0.033992... |
 |    3    | -0.0345229320228 | [0.176564618945, -0.050576... |
 |    4    | -0.0198582224548 | [-0.0773911848664, -0.0500... |
 |    5    | -0.0562275871634 | [-0.0598151274025, -0.0059... |
 |    6    | -0.0401206016541 | [0.0565584115684, 0.030123... |
 |    7    | -0.0433877147734 | [0.205288589001, -0.060017... |
 |    8    | -0.0184100158513 | [0.169030055404, -0.043373... |
 |    9    | -0.0512112490833 | [0.163330376148, -0.060946... |
 |    10   | -0.0407416447997 | [-0.420519113541, 0.110337... |
 +---------+------------------+-------------------------------+
 [6040 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}

In [16]:
graphlab.recommender.util.compare_models(test[test['rating'] > 4], 
                                    [pop, itemcf, m], 
                                    user_sample=0.2, 
                                    metric='precision_recall')


compare_models: using 183 users to estimate model performance
PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    |  0.109289617486 | 0.0154169412131 |
|   2    |  0.114754098361 | 0.0315571827129 |
|   3    |  0.103825136612 | 0.0393550677194 |
|   4    | 0.0983606557377 | 0.0488860172488 |
|   5    | 0.0983606557377 |  0.057530354299 |
|   6    | 0.0983606557377 |  0.06952814808  |
|   7    | 0.0967993754879 | 0.0776744105871 |
|   8    | 0.0949453551913 | 0.0871933441083 |
|   9    | 0.0910746812386 | 0.0970583805009 |
|   10   | 0.0890710382514 |  0.105731522781 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.218579234973 | 0.0336356996991 |
|   2    | 0.185792349727 | 0.0491081612808 |
|   3    | 0.182149362477 | 0.0721856847862 |
|   4    | 0.172131147541 |  0.086814432767 |
|   5    | 0.165027322404 | 0.0983099175722 |
|   6    | 0.152094717668 |  0.110996252299 |
|   7    | 0.148321623731 |  0.13067735829  |
|   8    | 0.148907103825 |  0.150453968213 |
|   9    | 0.142076502732 |  0.158699171088 |
|   10   | 0.134426229508 |  0.166857542043 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M2

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.27868852459  | 0.0355923139267 |
|   2    | 0.226775956284 | 0.0540712203094 |
|   3    | 0.213114754098 | 0.0716753913564 |
|   4    | 0.198087431694 | 0.0898091945474 |
|   5    | 0.183606557377 |  0.100699809919 |
|   6    | 0.182149362477 |  0.11362028645  |
|   7    | 0.185011709602 |  0.137198290932 |
|   8    | 0.177595628415 |  0.147966304582 |
|   9    | 0.169398907104 |  0.156916738229 |
|   10   | 0.160655737705 |  0.171367047623 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

Out[16]:
[{'precision_recall_by_user': Columns:
  	user_id	int
  	cutoff	int
  	precision	float
  	recall	float
  	count	int
  
  Rows: 3294
  
  Data:
  +---------+--------+----------------+----------------+-------+
  | user_id | cutoff |   precision    |     recall     | count |
  +---------+--------+----------------+----------------+-------+
  |    42   |   1    |      0.0       |      0.0       |   7   |
  |    42   |   2    |      0.5       | 0.142857142857 |   7   |
  |    42   |   3    | 0.333333333333 | 0.142857142857 |   7   |
  |    42   |   4    |      0.25      | 0.142857142857 |   7   |
  |    42   |   5    |      0.2       | 0.142857142857 |   7   |
  |    42   |   6    | 0.166666666667 | 0.142857142857 |   7   |
  |    42   |   7    | 0.142857142857 | 0.142857142857 |   7   |
  |    42   |   8    |     0.125      | 0.142857142857 |   7   |
  |    42   |   9    | 0.111111111111 | 0.142857142857 |   7   |
  |    42   |   10   |      0.1       | 0.142857142857 |   7   |
  +---------+--------+----------------+----------------+-------+
  [3294 rows x 5 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
  'precision_recall_overall': Columns:
  	cutoff	int
  	precision	float
  	recall	float
  
  Rows: 18
  
  Data:
  +--------+-----------------+-----------------+
  | cutoff |    precision    |      recall     |
  +--------+-----------------+-----------------+
  |   1    |  0.109289617486 | 0.0154169412131 |
  |   2    |  0.114754098361 | 0.0315571827129 |
  |   3    |  0.103825136612 | 0.0393550677194 |
  |   4    | 0.0983606557377 | 0.0488860172488 |
  |   5    | 0.0983606557377 |  0.057530354299 |
  |   6    | 0.0983606557377 |  0.06952814808  |
  |   7    | 0.0967993754879 | 0.0776744105871 |
  |   8    | 0.0949453551913 | 0.0871933441083 |
  |   9    | 0.0910746812386 | 0.0970583805009 |
  |   10   | 0.0890710382514 |  0.105731522781 |
  +--------+-----------------+-----------------+
  [18 rows x 3 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.},
 {'precision_recall_by_user': Columns:
  	user_id	int
  	cutoff	int
  	precision	float
  	recall	float
  	count	int
  
  Rows: 3294
  
  Data:
  +---------+--------+----------------+----------------+-------+
  | user_id | cutoff |   precision    |     recall     | count |
  +---------+--------+----------------+----------------+-------+
  |    42   |   1    |      1.0       | 0.142857142857 |   7   |
  |    42   |   2    |      0.5       | 0.142857142857 |   7   |
  |    42   |   3    | 0.333333333333 | 0.142857142857 |   7   |
  |    42   |   4    |      0.25      | 0.142857142857 |   7   |
  |    42   |   5    |      0.4       | 0.285714285714 |   7   |
  |    42   |   6    | 0.333333333333 | 0.285714285714 |   7   |
  |    42   |   7    | 0.285714285714 | 0.285714285714 |   7   |
  |    42   |   8    |      0.25      | 0.285714285714 |   7   |
  |    42   |   9    | 0.222222222222 | 0.285714285714 |   7   |
  |    42   |   10   |      0.2       | 0.285714285714 |   7   |
  +---------+--------+----------------+----------------+-------+
  [3294 rows x 5 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
  'precision_recall_overall': Columns:
  	cutoff	int
  	precision	float
  	recall	float
  
  Rows: 18
  
  Data:
  +--------+----------------+-----------------+
  | cutoff |   precision    |      recall     |
  +--------+----------------+-----------------+
  |   1    | 0.218579234973 | 0.0336356996991 |
  |   2    | 0.185792349727 | 0.0491081612808 |
  |   3    | 0.182149362477 | 0.0721856847862 |
  |   4    | 0.172131147541 |  0.086814432767 |
  |   5    | 0.165027322404 | 0.0983099175722 |
  |   6    | 0.152094717668 |  0.110996252299 |
  |   7    | 0.148321623731 |  0.13067735829  |
  |   8    | 0.148907103825 |  0.150453968213 |
  |   9    | 0.142076502732 |  0.158699171088 |
  |   10   | 0.134426229508 |  0.166857542043 |
  +--------+----------------+-----------------+
  [18 rows x 3 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.},
 {'precision_recall_by_user': Columns:
  	user_id	int
  	cutoff	int
  	precision	float
  	recall	float
  	count	int
  
  Rows: 3294
  
  Data:
  +---------+--------+----------------+----------------+-------+
  | user_id | cutoff |   precision    |     recall     | count |
  +---------+--------+----------------+----------------+-------+
  |    42   |   1    |      1.0       | 0.142857142857 |   7   |
  |    42   |   2    |      0.5       | 0.142857142857 |   7   |
  |    42   |   3    | 0.333333333333 | 0.142857142857 |   7   |
  |    42   |   4    |      0.25      | 0.142857142857 |   7   |
  |    42   |   5    |      0.2       | 0.142857142857 |   7   |
  |    42   |   6    | 0.166666666667 | 0.142857142857 |   7   |
  |    42   |   7    | 0.142857142857 | 0.142857142857 |   7   |
  |    42   |   8    |     0.125      | 0.142857142857 |   7   |
  |    42   |   9    | 0.111111111111 | 0.142857142857 |   7   |
  |    42   |   10   |      0.2       | 0.285714285714 |   7   |
  +---------+--------+----------------+----------------+-------+
  [3294 rows x 5 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
  'precision_recall_overall': Columns:
  	cutoff	int
  	precision	float
  	recall	float
  
  Rows: 18
  
  Data:
  +--------+----------------+-----------------+
  | cutoff |   precision    |      recall     |
  +--------+----------------+-----------------+
  |   1    | 0.27868852459  | 0.0355923139267 |
  |   2    | 0.226775956284 | 0.0540712203094 |
  |   3    | 0.213114754098 | 0.0716753913564 |
  |   4    | 0.198087431694 | 0.0898091945474 |
  |   5    | 0.183606557377 |  0.100699809919 |
  |   6    | 0.182149362477 |  0.11362028645  |
  |   7    | 0.185011709602 |  0.137198290932 |
  |   8    | 0.177595628415 |  0.147966304582 |
  |   9    | 0.169398907104 |  0.156916738229 |
  |   10   | 0.160655737705 |  0.171367047623 |
  +--------+----------------+-----------------+
  [18 rows x 3 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}]

Optimizing for ranking


In [20]:
m_rank = graphlab.recommender.ranking_factorization_recommender.create(train, 'user_id', 'movie_id', 'rating', 
                                     unobserved_rating_value=3)


Recsys training: model = ranking_factorization_recommender
Preparing data set.
    Data has 965508 observations with 6040 users and 3706 items.
    Data prepared in: 0.910656s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter                      | Description                                      | Value    |
+--------------------------------+--------------------------------------------------+----------+
| num_factors                    | Factor Dimension                                 | 32       |
| regularization                 | L2 Regularization on Factors                     | 1e-09    |
| solver                         | Solver used for training                         | adagrad  |
| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |
| ranking_regularization         | Rank-based Regularization Weight                 | 0.25     |
| unobserved_rating_value        | Ranking Target Rating for Unobserved Interacti...| 3        |
| max_iterations                 | Maximum Number of Iterations                     | 25       |
+--------------------------------+--------------------------------------------------+----------+
  Optimizing model using SGD; tuning step size.
  Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value                |
+---------+-------------------+------------------------------------------+
| 0       | 16.6667           | Not Viable                               |
| 1       | 4.16667           | Not Viable                               |
| 2       | 1.04167           | Not Viable                               |
| 3       | 0.260417          | Not Viable                               |
| 4       | 0.0651042         | 0.998804                                 |
| 5       | 0.0325521         | 0.953073                                 |
| 6       | 0.016276          | 1.00856                                  |
| 7       | 0.00813802        | 1.0661                                   |
| 8       | 0.00406901        | 1.23488                                  |
+---------+-------------------+------------------------------------------+
| Final   | 0.0325521         | 0.953073                                 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 107us        | 1.33247           | 1.11699               |             |
+---------+--------------+-------------------+-----------------------+-------------+
| 1       | 885.253ms    | 1.10069           | 1.00012               | 0.0325521   |
| 2       | 1.90s        | 1.01103           | 0.956979              | 0.0325521   |
| 3       | 3.00s        | 0.974312          | 0.937847              | 0.0325521   |
| 4       | 4.07s        | 0.960712          | 0.931004              | 0.0325521   |
| 5       | 5.12s        | 0.949761          | 0.9254                | 0.0325521   |
| 6       | 6.07s        | 0.942225          | 0.921526              | 0.0325521   |
| 7       | 7.06s        | 0.935704          | 0.918205              | 0.0325521   |
| 8       | 8.02s        | 0.930567          | 0.915684              | 0.0325521   |
| 9       | 9.05s        | 0.925405          | 0.912967              | 0.0325521   |
| 10      | 10.33s       | 0.920952          | 0.910727              | 0.0325521   |
| 11      | 11.58s       | 0.916647          | 0.908743              | 0.0325521   |
| 12      | 12.62s       | 0.913399          | 0.907017              | 0.0325521   |
| 13      | 13.58s       | 0.909575          | 0.904969              | 0.0325521   |
| 14      | 14.67s       | 0.906824          | 0.90367               | 0.0325521   |
| 15      | 15.77s       | 0.904054          | 0.902198              | 0.0325521   |
| 16      | 16.86s       | 0.901294          | 0.90096               | 0.0325521   |
| 17      | 17.85s       | 0.898579          | 0.899525              | 0.0325521   |
| 18      | 18.86s       | 0.896474          | 0.898482              | 0.0325521   |
| 19      | 20.07s       | 0.894312          | 0.897331              | 0.0325521   |
| 20      | 21.10s       | 0.892068          | 0.896046              | 0.0325521   |
| 21      | 22.12s       | 0.88988           | 0.894963              | 0.0325521   |
| 22      | 23.33s       | 0.887669          | 0.893956              | 0.0325521   |
| 23      | 24.51s       | 0.885674          | 0.892851              | 0.0325521   |
| 24      | 25.58s       | 0.884228          | 0.892176              | 0.0325521   |
| 25      | 26.59s       | 0.882557          | 0.891299              | 0.0325521   |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
       Final objective value: 0.882406
       Final training RMSE: 0.886832

In [21]:
results = graphlab.recommender.util.compare_models(test[test['rating'] > 4], 
                                              [pop, itemcf, m, m_rank], 
                                              user_sample=0.2, 
                                              metric='precision_recall')


compare_models: using 183 users to estimate model performance
PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    |  0.103825136612 | 0.0116730652166 |
|   2    |  0.101092896175 | 0.0282821265133 |
|   3    | 0.0819672131148 | 0.0387227556041 |
|   4    | 0.0833333333333 |  0.054260741295 |
|   5    | 0.0786885245902 | 0.0652009814042 |
|   6    | 0.0765027322404 | 0.0758533021658 |
|   7    | 0.0772833723653 |  0.087870157321 |
|   8    | 0.0785519125683 |  0.10363872715  |
|   9    | 0.0740740740741 |  0.108125993351 |
|   10   | 0.0743169398907 |  0.125169969412 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.16393442623  |  0.033918559494 |
|   2    | 0.166666666667 | 0.0559035280067 |
|   3    | 0.158469945355 | 0.0802610557096 |
|   4    | 0.147540983607 | 0.0993935937697 |
|   5    | 0.138797814208 |  0.116006405262 |
|   6    | 0.128415300546 |  0.125824450712 |
|   7    | 0.128805620609 |  0.148368313836 |
|   8    |     0.125      |  0.162294248876 |
|   9    | 0.12204007286  |  0.173015991344 |
|   10   | 0.117486338798 |  0.18606052953  |
+--------+----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M2

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.24043715847  |  0.032990686752 |
|   2    | 0.196721311475 | 0.0593716586723 |
|   3    | 0.187613843352 | 0.0783908312908 |
|   4    | 0.180327868852 |  0.102234190817 |
|   5    | 0.16393442623  |  0.120610724355 |
|   6    | 0.159380692168 |  0.140119556509 |
|   7    | 0.152224824356 |  0.157806327365 |
|   8    | 0.142759562842 |  0.166809497193 |
|   9    | 0.137826350941 |  0.175280850522 |
|   10   | 0.134972677596 |  0.186140806231 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M3

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.114754098361 | 0.0220448212961 |
|   2    | 0.120218579235 | 0.0351999242985 |
|   3    | 0.111111111111 | 0.0445415535372 |
|   4    | 0.106557377049 |  0.052975921608 |
|   5    | 0.110382513661 |  0.076614241225 |
|   6    | 0.111111111111 | 0.0956858236539 |
|   7    | 0.110850897736 |  0.113159656127 |
|   8    | 0.106557377049 |  0.121421484491 |
|   9    | 0.102610807529 |  0.136444329342 |
|   10   | 0.101092896175 |  0.14669712098  |
+--------+----------------+-----------------+
[10 rows x 3 columns]


In [28]:
results[3]['precision_recall_overall']


Out[28]:
cutoff precision recall
1 0.114754098361 0.0220448212961
2 0.120218579235 0.0351999242985
3 0.111111111111 0.0445415535372
4 0.106557377049 0.052975921608
5 0.110382513661 0.076614241225
6 0.111111111111 0.0956858236539
7 0.110850897736 0.113159656127
8 0.106557377049 0.121421484491
9 0.102610807529 0.136444329342
10 0.101092896175 0.14669712098
[18 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Experimenting with side features


In [29]:
user_sf = graphlab.SFrame('users')
item_sf = graphlab.SFrame('items')

In [30]:
m_user = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating', 
                                     user_data=user_sf)
m_item = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating', 
                                     item_data=item_sf)
m_both = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating', 
                                     user_data=user_sf, item_data=item_sf)


Recsys training: model = ranking_factorization_recommender
Preparing data set.
    Data has 965508 observations with 6040 users and 3706 items.
    Data prepared in: 0.872319s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter                      | Description                                      | Value    |
+--------------------------------+--------------------------------------------------+----------+
| num_factors                    | Factor Dimension                                 | 32       |
| regularization                 | L2 Regularization on Factors                     | 1e-09    |
| solver                         | Solver used for training                         | adagrad  |
| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |
| ranking_regularization         | Rank-based Regularization Weight                 | 0.25     |
| side_data_factorization        | Assign Factors for Side Data                     | True     |
| max_iterations                 | Maximum Number of Iterations                     | 25       |
+--------------------------------+--------------------------------------------------+----------+
  Optimizing model using SGD; tuning step size.
  Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value                |
+---------+-------------------+------------------------------------------+
| 0       | 7.14286           | Not Viable                               |
| 1       | 1.78571           | Not Viable                               |
| 2       | 0.446429          | 1.52387                                  |
| 3       | 0.223214          | Not Viable                               |
| 4       | 0.0558036         | 1.79945                                  |
| 5       | 0.0279018         | 1.78969                                  |
+---------+-------------------+------------------------------------------+
| Final   | 0.446429          | 1.52387                                  |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 63us         | 2.44592           | 1.11697               |             |
+---------+--------------+-------------------+-----------------------+-------------+
| 1       | 2.13s        | DIVERGED          | DIVERGED              | 0.446429    |
| RESET   | 2.99s        | 2.44647           | 1.11698               |             |
| 1       | 5.10s        | 1.83529           | 1.10141               | 0.223214    |
| 2       | 6.90s        | 1.47109           | 0.95455               | 0.223214    |
| 3       | 8.80s        | 1.36577           | 0.917865              | 0.223214    |
| 4       | 10.62s       | 1.31007           | 0.897384              | 0.223214    |
| 5       | 12.44s       | 1.27125           | 0.883491              | 0.223214    |
| 6       | 14.67s       | 1.24635           | 0.873351              | 0.223214    |
| 7       | 16.70s       | 1.22622           | 0.865748              | 0.223214    |
| 8       | 18.63s       | 1.21028           | 0.859754              | 0.223214    |
| 9       | 20.50s       | 1.19796           | 0.854523              | 0.223214    |
| 10      | 22.46s       | 1.18689           | 0.850478              | 0.223214    |
| 11      | 24.31s       | 1.1783            | 0.846785              | 0.223214    |
| 12      | 26.18s       | 1.17084           | 0.84352               | 0.223214    |
| 13      | 28.07s       | 1.16355           | 0.840745              | 0.223214    |
| 14      | 29.97s       | 1.15711           | 0.838398              | 0.223214    |
| 15      | 32.09s       | 1.15247           | 0.836416              | 0.223214    |
| 16      | 34.04s       | 1.14785           | 0.83443               | 0.223214    |
| 17      | 36.25s       | 1.14331           | 0.832546              | 0.223214    |
| 18      | 38.41s       | 1.13848           | 0.830724              | 0.223214    |
| 19      | 40.34s       | 1.13683           | 0.82959               | 0.223214    |
| 20      | 42.39s       | 1.13266           | 0.828052              | 0.223214    |
| 21      | 44.75s       | 1.13008           | 0.827049              | 0.223214    |
| 22      | 47.05s       | 1.12695           | 0.82589               | 0.223214    |
| 23      | 49.58s       | 1.12374           | 0.824564              | 0.223214    |
| 24      | 51.93s       | 1.12202           | 0.823962              | 0.223214    |
| 25      | 54.30s       | 1.12015           | 0.822855              | 0.223214    |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
       Final objective value: 1.12099
       Final training RMSE: 0.797943
Recsys training: model = ranking_factorization_recommender
Preparing data set.
    Data has 965508 observations with 6040 users and 3883 items.
    Data prepared in: 1.16364s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter                      | Description                                      | Value    |
+--------------------------------+--------------------------------------------------+----------+
| num_factors                    | Factor Dimension                                 | 32       |
| regularization                 | L2 Regularization on Factors                     | 1e-09    |
| solver                         | Solver used for training                         | adagrad  |
| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |
| ranking_regularization         | Rank-based Regularization Weight                 | 0.25     |
| side_data_factorization        | Assign Factors for Side Data                     | True     |
| max_iterations                 | Maximum Number of Iterations                     | 25       |
+--------------------------------+--------------------------------------------------+----------+
  Optimizing model using SGD; tuning step size.
  Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value                |
+---------+-------------------+------------------------------------------+
| 0       | 10                | Not Viable                               |
| 1       | 2.5               | Not Viable                               |
| 2       | 0.625             | Not Viable                               |
| 3       | 0.15625           | 1.08443                                  |
| 4       | 0.078125          | 1.72974                                  |
| 5       | 0.0390625         | 1.84654                                  |
| 6       | 0.0195312         | 1.74856                                  |
+---------+-------------------+------------------------------------------+
| Final   | 0.15625           | 1.08443                                  |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 70us         | 2.44637           | 1.11698               |             |
+---------+--------------+-------------------+-----------------------+-------------+
| 1       | 2.33s        | DIVERGED          | DIVERGED              | 0.15625     |
| RESET   | 3.11s        | 2.44644           | 1.11697               |             |
| 1       | 4.92s        | 1.75622           | 1.06161               | 0.078125    |
| 2       | 6.57s        | 1.50614           | 0.966049              | 0.078125    |
| 3       | 8.08s        | 1.4167            | 0.933697              | 0.078125    |
| 4       | 9.53s        | 1.37077           | 0.917417              | 0.078125    |
| 5       | 10.99s       | 1.34379           | 0.908108              | 0.078125    |
| 6       | 12.65s       | 1.32151           | 0.899986              | 0.078125    |
| 7       | 14.14s       | 1.30427           | 0.89374               | 0.078125    |
| 8       | 15.59s       | 1.2895            | 0.888268              | 0.078125    |
| 9       | 17.03s       | 1.27727           | 0.884087              | 0.078125    |
| 10      | 18.47s       | 1.2662            | 0.879697              | 0.078125    |
| 11      | 19.91s       | 1.25785           | 0.876548              | 0.078125    |
| 12      | 21.33s       | 1.24893           | 0.873322              | 0.078125    |
| 13      | 22.73s       | 1.24222           | 0.870908              | 0.078125    |
| 14      | 24.17s       | 1.23693           | 0.868724              | 0.078125    |
| 15      | 25.64s       | 1.23104           | 0.866697              | 0.078125    |
| 16      | 27.07s       | 1.22657           | 0.865066              | 0.078125    |
| 17      | 28.47s       | 1.22185           | 0.86311               | 0.078125    |
| 18      | 29.90s       | 1.21603           | 0.860832              | 0.078125    |
| 19      | 31.34s       | 1.21214           | 0.859841              | 0.078125    |
| 20      | 32.75s       | 1.20866           | 0.858349              | 0.078125    |
| 21      | 34.17s       | 1.20588           | 0.857265              | 0.078125    |
| 22      | 35.59s       | 1.2013            | 0.855384              | 0.078125    |
| 23      | 36.98s       | 1.19868           | 0.854415              | 0.078125    |
| 24      | 38.40s       | 1.19618           | 0.8536                | 0.078125    |
| 25      | 39.82s       | 1.19373           | 0.852524              | 0.078125    |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
       Final objective value: 1.22299
       Final training RMSE: 0.842974
Recsys training: model = ranking_factorization_recommender
Preparing data set.
    Data has 965508 observations with 6040 users and 3883 items.
    Data prepared in: 0.897359s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter                      | Description                                      | Value    |
+--------------------------------+--------------------------------------------------+----------+
| num_factors                    | Factor Dimension                                 | 32       |
| regularization                 | L2 Regularization on Factors                     | 1e-09    |
| solver                         | Solver used for training                         | adagrad  |
| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |
| ranking_regularization         | Rank-based Regularization Weight                 | 0.25     |
| side_data_factorization        | Assign Factors for Side Data                     | True     |
| max_iterations                 | Maximum Number of Iterations                     | 25       |
+--------------------------------+--------------------------------------------------+----------+
  Optimizing model using SGD; tuning step size.
  Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value                |
+---------+-------------------+------------------------------------------+
| 0       | 5.55556           | Not Viable                               |
| 1       | 1.38889           | Not Viable                               |
| 2       | 0.347222          | 1.41163                                  |
| 3       | 0.173611          | 1.06475                                  |
| 4       | 0.0868056         | 1.03146                                  |
| 5       | 0.0434028         | 1.1885                                   |
| 6       | 0.0217014         | 1.54004                                  |
| 7       | 0.0108507         | 1.77557                                  |
+---------+-------------------+------------------------------------------+
| Final   | 0.0868056         | 1.03146                                  |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 70us         | 2.4467            | 1.117                 |             |
+---------+--------------+-------------------+-----------------------+-------------+
| 1       | 3.02s        | DIVERGED          | DIVERGED              | 0.0868056   |
| RESET   | 3.97s        | 2.4468            | 1.11698               |             |
| 1       | 6.22s        | 1.56747           | 1.00211               | 0.0434028   |
| 2       | 8.40s        | 1.4123            | 0.936364              | 0.0434028   |
| 3       | 10.61s       | 1.34465           | 0.91251               | 0.0434028   |
| 4       | 13.00s       | 1.30528           | 0.898029              | 0.0434028   |
| 5       | 15.21s       | 1.27777           | 0.887646              | 0.0434028   |
| 6       | 18.09s       | 1.25523           | 0.879139              | 0.0434028   |
| 7       | 20.54s       | 1.23908           | 0.872961              | 0.0434028   |
| 8       | 22.79s       | 1.22569           | 0.867368              | 0.0434028   |
| 9       | 25.25s       | 1.21337           | 0.862705              | 0.0434028   |
| 10      | 28.35s       | 1.20338           | 0.858906              | 0.0434028   |
| 11      | 31.26s       | 1.19501           | 0.855631              | 0.0434028   |
| 12      | 33.70s       | 1.18678           | 0.852674              | 0.0434028   |
| 13      | 35.86s       | 1.18128           | 0.850198              | 0.0434028   |
| 14      | 38.05s       | 1.17569           | 0.847772              | 0.0434028   |
| 15      | 40.25s       | 1.16953           | 0.845739              | 0.0434028   |
| 16      | 42.45s       | 1.16479           | 0.843867              | 0.0434028   |
| 17      | 44.62s       | 1.16132           | 0.842166              | 0.0434028   |
| 18      | 46.80s       | 1.15646           | 0.840656              | 0.0434028   |
| 19      | 48.95s       | 1.15327           | 0.838994              | 0.0434028   |
| 20      | 51.21s       | 1.15068           | 0.837869              | 0.0434028   |
| 21      | 53.40s       | 1.14713           | 0.836618              | 0.0434028   |
| 22      | 55.60s       | 1.14393           | 0.835341              | 0.0434028   |
| 23      | 57.78s       | 1.14105           | 0.834355              | 0.0434028   |
| 24      | 59.92s       | 1.13905           | 0.833327              | 0.0434028   |
| 25      | 1m 2s        | 1.13617           | 0.832308              | 0.0434028   |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
       Final objective value: 1.16266
       Final training RMSE: 0.822791

In [31]:
m_both


Out[31]:
Class                           : RankingFactorizationRecommender

Schema
------
User ID                         : user_id
Item ID                         : movie_id
Target                          : rating
Additional observation features : 1
Number of user side features    : 5
Number of item side features    : 3

Statistics
----------
Number of observations          : 965508
Number of users                 : 6040
Number of items                 : 3883

Training summary
----------------
Training time                   : 74.1749

Model Parameters
----------------
Model class                     : RankingFactorizationRecommender
num_factors                     : 32
binary_target                   : 0
side_data_factorization         : 1
solver                          : auto
nmf                             : 0
max_iterations                  : 25

Regularization Settings
-----------------------
regularization                  : 0.0
regularization_type             : normal
linear_regularization           : 0.0
ranking_regularization          : 0.25
unobserved_rating_value         : -1.79769313486e+308
num_sampled_negative_examples   : 4
ials_confidence_scaling_type    : auto
ials_confidence_scaling_factor  : 1

Optimization Settings
---------------------
init_random_sigma               : 0.01
sgd_convergence_interval        : 4
sgd_convergence_threshold       : 0.0
sgd_max_trial_iterations        : 5
sgd_sampling_block_size         : 131072
sgd_step_adjustment_interval    : 4
sgd_step_size                   : 0.0
sgd_trial_sample_minimum_size   : 10000
sgd_trial_sample_proportion     : 0.125
step_size_decrease_rate         : 0.75
additional_iterations_if_unhealthy: 5
adagrad_momentum_weighting      : 0.9
num_tempering_iterations        : 4
tempering_regularization_start_value: 0.0
track_exact_loss                : 0

In [32]:
results = graphlab.recommender.util.compare_models(test, [m, m_user, m_item, m_both], user_sample=0.2)


compare_models: using 200 users to estimate model performance
PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    |     0.305      | 0.0104911692762 |
|   2    |     0.3075     | 0.0218598413174 |
|   3    | 0.298333333333 | 0.0312188865716 |
|   4    |    0.28375     | 0.0416981002021 |
|   5    |     0.267      | 0.0490821420076 |
|   6    |     0.255      | 0.0566601620839 |
|   7    | 0.244285714286 | 0.0634873210823 |
|   8    |     0.2325     | 0.0671075588864 |
|   9    | 0.227222222222 | 0.0732309783629 |
|   10   |      0.22      | 0.0778109177835 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 0.9976486246840836)

Per User RMSE (best)
+---------+-------+----------------+
| user_id | count |      rmse      |
+---------+-------+----------------+
|   4259  |   4   | 0.482071967797 |
+---------+-------+----------------+
[1 rows x 3 columns]


Per User RMSE (worst)
+---------+-------+---------------+
| user_id | count |      rmse     |
+---------+-------+---------------+
|   3275  |   4   | 1.92700170805 |
+---------+-------+---------------+
[1 rows x 3 columns]


Per Item RMSE (best)
+----------+-------+------------------+
| movie_id | count |       rmse       |
+----------+-------+------------------+
|   163    |   1   | 0.00169078452152 |
+----------+-------+------------------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+----------+-------+---------------+
| movie_id | count |      rmse     |
+----------+-------+---------------+
|   3196   |   1   | 4.01836458102 |
+----------+-------+---------------+
[1 rows x 3 columns]

PROGRESS: Evaluate model M1

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    |     0.375      | 0.0136968745991 |
|   2    |     0.3575     |  0.023755002765 |
|   3    |     0.335      | 0.0362605125879 |
|   4    |     0.3075     | 0.0423008826942 |
|   5    |     0.293      | 0.0497532350863 |
|   6    | 0.281666666667 | 0.0578575934232 |
|   7    | 0.282142857143 | 0.0671244350417 |
|   8    |    0.27125     | 0.0748468085545 |
|   9    | 0.262222222222 | 0.0813855604447 |
|   10   |      0.26      | 0.0896391608922 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 1.0433027431346846)

Per User RMSE (best)
+---------+-------+---------------+
| user_id | count |      rmse     |
+---------+-------+---------------+
|   4259  |   4   | 0.24496854802 |
+---------+-------+---------------+
[1 rows x 3 columns]


Per User RMSE (worst)
+---------+-------+---------------+
| user_id | count |      rmse     |
+---------+-------+---------------+
|   2912  |   7   | 2.13582589594 |
+---------+-------+---------------+
[1 rows x 3 columns]


Per Item RMSE (best)
+----------+-------+------------------+
| movie_id | count |       rmse       |
+----------+-------+------------------+
|   379    |   1   | 0.00331903448671 |
+----------+-------+------------------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+----------+-------+---------------+
| movie_id | count |      rmse     |
+----------+-------+---------------+
|   3117   |   1   | 3.61265547345 |
+----------+-------+---------------+
[1 rows x 3 columns]

PROGRESS: Evaluate model M2

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    |     0.355      |  0.01355116837  |
|   2    |     0.345      | 0.0247731072461 |
|   3    | 0.328333333333 | 0.0346428767624 |
|   4    |      0.3       | 0.0417405134962 |
|   5    |     0.286      | 0.0496330466776 |
|   6    | 0.283333333333 | 0.0607078786391 |
|   7    | 0.273571428571 |  0.068217750301 |
|   8    |    0.265625    | 0.0745364058279 |
|   9    | 0.262777777778 | 0.0821126915135 |
|   10   |     0.255      | 0.0868626759608 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 1.0121070979184972)

Per User RMSE (best)
+---------+-------+----------------+
| user_id | count |      rmse      |
+---------+-------+----------------+
|   4259  |   4   | 0.280887940151 |
+---------+-------+----------------+
[1 rows x 3 columns]


Per User RMSE (worst)
+---------+-------+---------------+
| user_id | count |      rmse     |
+---------+-------+---------------+
|   2912  |   7   | 2.21734756181 |
+---------+-------+---------------+
[1 rows x 3 columns]


Per Item RMSE (best)
+----------+-------+-------------------+
| movie_id | count |        rmse       |
+----------+-------+-------------------+
|   1283   |   1   | 0.000539099143753 |
+----------+-------+-------------------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+----------+-------+---------------+
| movie_id | count |      rmse     |
+----------+-------+---------------+
|   1806   |   1   | 3.39939804512 |
+----------+-------+---------------+
[1 rows x 3 columns]

PROGRESS: Evaluate model M3

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    |      0.4       | 0.0173755798355 |
|   2    |     0.3625     | 0.0265019144168 |
|   3    | 0.338333333333 | 0.0380391678111 |
|   4    |     0.3275     | 0.0479764485634 |
|   5    |     0.312      | 0.0576493349033 |
|   6    | 0.299166666667 | 0.0654828586353 |
|   7    | 0.295714285714 | 0.0760004175285 |
|   8    |      0.29      | 0.0841766126368 |
|   9    | 0.286666666667 |  0.093854767667 |
|   10   |     0.2755     | 0.0995420682121 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 0.9936036664127302)

Per User RMSE (best)
+---------+-------+----------------+
| user_id | count |      rmse      |
+---------+-------+----------------+
|   4259  |   4   | 0.397063536039 |
+---------+-------+----------------+
[1 rows x 3 columns]


Per User RMSE (worst)
+---------+-------+---------------+
| user_id | count |      rmse     |
+---------+-------+---------------+
|   2912  |   7   | 2.01561012851 |
+---------+-------+---------------+
[1 rows x 3 columns]


Per Item RMSE (best)
+----------+-------+------------------+
| movie_id | count |       rmse       |
+----------+-------+------------------+
|   849    |   1   | 0.00179710693082 |
+----------+-------+------------------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+----------+-------+---------------+
| movie_id | count |      rmse     |
+----------+-------+---------------+
|   3806   |   1   | 3.81246875723 |
+----------+-------+---------------+
[1 rows x 3 columns]


In [33]:
[results[i]['rmse_overall'] for i in range(len(results))]


Out[33]:
[0.9976486246840836,
 1.0433027431346846,
 1.0121070979184972,
 0.9936036664127302]

In [34]:
results[0]['rmse_by_item'].show()



In [36]:
graphlab.recommender.util.compare_models(test[test['rating'] > 4], 
                                    [m_rank, m_both], 
                                    user_sample=0.2, 
                                    metric='precision_recall')


compare_models: using 183 users to estimate model performance
PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.158469945355 | 0.0235039678346 |
|   2    | 0.155737704918 | 0.0367981705878 |
|   3    | 0.136612021858 | 0.0436527349676 |
|   4    | 0.137978142077 | 0.0577237401993 |
|   5    | 0.136612021858 | 0.0723710331936 |
|   6    | 0.125683060109 | 0.0842016591703 |
|   7    | 0.124902419984 |  0.101563633319 |
|   8    | 0.118852459016 |  0.109983711543 |
|   9    | 0.114754098361 |  0.11908895883  |
|   10   | 0.111475409836 |  0.128536481539 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.245901639344 | 0.0538736282089 |
|   2    | 0.234972677596 | 0.0758295229523 |
|   3    | 0.23679417122  |  0.101520870674 |
|   4    | 0.225409836066 |  0.12569838792  |
|   5    | 0.209836065574 |  0.142998857028 |
|   6    | 0.192167577413 |  0.155246605969 |
|   7    | 0.183450429352 |  0.174214946795 |
|   8    | 0.176229508197 |  0.190227369104 |
|   9    | 0.165148755313 |  0.200596309693 |
|   10   | 0.158469945355 |  0.210908575339 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

Out[36]:
[{'precision_recall_by_user': Columns:
  	user_id	int
  	cutoff	int
  	precision	float
  	recall	float
  	count	int
  
  Rows: 3294
  
  Data:
  +---------+--------+-----------+--------+-------+
  | user_id | cutoff | precision | recall | count |
  +---------+--------+-----------+--------+-------+
  |    3    |   1    |    0.0    |  0.0   |   2   |
  |    3    |   2    |    0.0    |  0.0   |   2   |
  |    3    |   3    |    0.0    |  0.0   |   2   |
  |    3    |   4    |    0.0    |  0.0   |   2   |
  |    3    |   5    |    0.0    |  0.0   |   2   |
  |    3    |   6    |    0.0    |  0.0   |   2   |
  |    3    |   7    |    0.0    |  0.0   |   2   |
  |    3    |   8    |    0.0    |  0.0   |   2   |
  |    3    |   9    |    0.0    |  0.0   |   2   |
  |    3    |   10   |    0.0    |  0.0   |   2   |
  +---------+--------+-----------+--------+-------+
  [3294 rows x 5 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
  'precision_recall_overall': Columns:
  	cutoff	int
  	precision	float
  	recall	float
  
  Rows: 18
  
  Data:
  +--------+----------------+-----------------+
  | cutoff |   precision    |      recall     |
  +--------+----------------+-----------------+
  |   1    | 0.158469945355 | 0.0235039678346 |
  |   2    | 0.155737704918 | 0.0367981705878 |
  |   3    | 0.136612021858 | 0.0436527349676 |
  |   4    | 0.137978142077 | 0.0577237401993 |
  |   5    | 0.136612021858 | 0.0723710331936 |
  |   6    | 0.125683060109 | 0.0842016591703 |
  |   7    | 0.124902419984 |  0.101563633319 |
  |   8    | 0.118852459016 |  0.109983711543 |
  |   9    | 0.114754098361 |  0.11908895883  |
  |   10   | 0.111475409836 |  0.128536481539 |
  +--------+----------------+-----------------+
  [18 rows x 3 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.},
 {'precision_recall_by_user': Columns:
  	user_id	int
  	cutoff	int
  	precision	float
  	recall	float
  	count	int
  
  Rows: 3294
  
  Data:
  +---------+--------+-----------+--------+-------+
  | user_id | cutoff | precision | recall | count |
  +---------+--------+-----------+--------+-------+
  |    3    |   1    |    0.0    |  0.0   |   2   |
  |    3    |   2    |    0.0    |  0.0   |   2   |
  |    3    |   3    |    0.0    |  0.0   |   2   |
  |    3    |   4    |    0.0    |  0.0   |   2   |
  |    3    |   5    |    0.0    |  0.0   |   2   |
  |    3    |   6    |    0.0    |  0.0   |   2   |
  |    3    |   7    |    0.0    |  0.0   |   2   |
  |    3    |   8    |    0.0    |  0.0   |   2   |
  |    3    |   9    |    0.0    |  0.0   |   2   |
  |    3    |   10   |    0.0    |  0.0   |   2   |
  +---------+--------+-----------+--------+-------+
  [3294 rows x 5 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
  'precision_recall_overall': Columns:
  	cutoff	int
  	precision	float
  	recall	float
  
  Rows: 18
  
  Data:
  +--------+----------------+-----------------+
  | cutoff |   precision    |      recall     |
  +--------+----------------+-----------------+
  |   1    | 0.245901639344 | 0.0538736282089 |
  |   2    | 0.234972677596 | 0.0758295229523 |
  |   3    | 0.23679417122  |  0.101520870674 |
  |   4    | 0.225409836066 |  0.12569838792  |
  |   5    | 0.209836065574 |  0.142998857028 |
  |   6    | 0.192167577413 |  0.155246605969 |
  |   7    | 0.183450429352 |  0.174214946795 |
  |   8    | 0.176229508197 |  0.190227369104 |
  |   9    | 0.165148755313 |  0.200596309693 |
  |   10   | 0.158469945355 |  0.210908575339 |
  +--------+----------------+-----------------+
  [18 rows x 3 columns]
  Note: Only the head of the SFrame is printed.
  You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}]

Factorization machines


In [ ]:
fm = graphlab.recommender.create(train.head(10000), 'user_id', 'movie_id', 'rating',
                                 method='factorization_model',
                                 item_data=item_sf, 
                                 sgd_step_size=0.09,
                                 max_iterations=10)

In [ ]: