In [1]:
import graphlab
graphlab.canvas.set_target("ipynb")
rating_sf = graphlab.SFrame('ratings')
users = graphlab.SFrame('users')
items = graphlab.SFrame('items')
This non-commercial license of GraphLab Create is assigned to wangchengjun@nju.edu.cn and will expire on July 31, 2016. For commercial licensing options, visit https://dato.com/buy/.
2016-04-14 01:35:40,341 [INFO] graphlab.cython.cy_server, 176: GraphLab Create v1.8.5 started. Logging: /tmp/graphlab_server_1460568932.log
In [2]:
rating_sf.show()
In [4]:
dir(graphlab.recommender)
Out[4]:
['__all__',
'__builtins__',
'__doc__',
'__file__',
'__name__',
'__package__',
'__path__',
'create',
'factorization_recommender',
'item_similarity_recommender',
'popularity_recommender',
'ranking_factorization_recommender',
'util']
In [6]:
(train, test) = graphlab.recommender.util.random_split_by_user(rating_sf, 'user_id', 'movie_id')
In [9]:
from graphlab import item_similarity_recommender
itemcf = item_similarity_recommender.create(train[train['rating'] > 4], 'user_id', 'movie_id')
Recsys training: model = item_similarity
Warning: Ignoring columns rating, timestamp;
To use one of these as a target column, set target =
and use a method that allows the use of a target.
Preparing data set.
Data has 218621 observations with 6012 users and 3224 items.
Data prepared in: 0.195331s
Computing item similarity statistics:
Computing most similar items for 3224 items:
+-----------------+-----------------+
| Number of items | Elapsed Time |
+-----------------+-----------------+
| 1000 | 1.08126 |
| 2000 | 1.10032 |
| 3000 | 1.12497 |
+-----------------+-----------------+
Finished training in 1.20171s
In [11]:
pop = graphlab.popularity_recommender.create(train[train['rating'] > 4], 'user_id', 'movie_id')
Recsys training: model = popularity
Warning: Ignoring columns rating, timestamp;
To use one of these as a target column, set target =
and use a method that allows the use of a target.
Preparing data set.
Data has 218621 observations with 6012 users and 3224 items.
Data prepared in: 0.237904s
218621 observations to process; with 3224 unique items.
In [12]:
m = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating')
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 965508 observations with 6040 users and 3706 items.
Data prepared in: 0.907655s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 16.6667 | Not Viable |
| 1 | 4.16667 | Not Viable |
| 2 | 1.04167 | Not Viable |
| 3 | 0.260417 | 1.80479 |
| 4 | 0.130208 | 1.8322 |
| 5 | 0.0651042 | 1.8873 |
| 6 | 0.0325521 | 1.88706 |
+---------+-------------------+------------------------------------------+
| Final | 0.260417 | 1.80479 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 101us | 2.4462 | 1.11698 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 1.43s | DIVERGED | DIVERGED | 0.260417 |
| RESET | 1.91s | 2.44619 | 1.11697 | |
| 1 | 3.24s | DIVERGED | DIVERGED | 0.130208 |
| RESET | 3.75s | 2.44619 | 1.11697 | |
| 1 | 4.84s | 2.10443 | 1.14093 | 0.0651042 |
| 2 | 5.81s | 1.82027 | 1.04353 | 0.0651042 |
| 3 | 6.89s | 1.75645 | 1.02196 | 0.0651042 |
| 4 | 7.90s | 1.7206 | 1.01294 | 0.0651042 |
| 5 | 8.99s | 1.69207 | 1.00488 | 0.0651042 |
| 6 | 10.03s | 1.66916 | 0.998471 | 0.0651042 |
| 7 | 11.11s | 1.64975 | 0.992687 | 0.0651042 |
| 8 | 12.27s | 1.63331 | 0.987803 | 0.0651042 |
| 9 | 13.53s | 1.6203 | 0.984347 | 0.0651042 |
| 10 | 14.68s | 1.60869 | 0.981751 | 0.0651042 |
| 11 | 15.79s | 1.59758 | 0.977906 | 0.0651042 |
| 12 | 16.82s | 1.58984 | 0.976171 | 0.0651042 |
| 13 | 17.96s | 1.58036 | 0.973489 | 0.0651042 |
| 14 | 19.24s | 1.57243 | 0.971477 | 0.0651042 |
| 15 | 20.37s | 1.56503 | 0.969302 | 0.0651042 |
| 16 | 21.50s | 1.55807 | 0.967444 | 0.0651042 |
| 17 | 22.63s | 1.55118 | 0.965764 | 0.0651042 |
| 18 | 23.79s | 1.54509 | 0.963793 | 0.0651042 |
| 19 | 24.93s | 1.53942 | 0.961991 | 0.0651042 |
| 20 | 26.23s | 1.53433 | 0.960398 | 0.0651042 |
| 21 | 27.37s | 1.52844 | 0.959103 | 0.0651042 |
| 22 | 28.46s | 1.52382 | 0.958025 | 0.0651042 |
| 23 | 29.55s | 1.51829 | 0.956181 | 0.0651042 |
| 24 | 30.81s | 1.51352 | 0.955045 | 0.0651042 |
| 25 | 32.13s | 1.50902 | 0.953533 | 0.0651042 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 1.53876
Final training RMSE: 0.948071
In [13]:
m
Out[13]:
Class : RankingFactorizationRecommender
Schema
------
User ID : user_id
Item ID : movie_id
Target : rating
Additional observation features : 1
Number of user side features : 0
Number of item side features : 0
Statistics
----------
Number of observations : 965508
Number of users : 6040
Number of items : 3706
Training summary
----------------
Training time : 36.9965
Model Parameters
----------------
Model class : RankingFactorizationRecommender
num_factors : 32
binary_target : 0
side_data_factorization : 1
solver : auto
nmf : 0
max_iterations : 25
Regularization Settings
-----------------------
regularization : 0.0
regularization_type : normal
linear_regularization : 0.0
ranking_regularization : 0.25
unobserved_rating_value : -1.79769313486e+308
num_sampled_negative_examples : 4
ials_confidence_scaling_type : auto
ials_confidence_scaling_factor : 1
Optimization Settings
---------------------
init_random_sigma : 0.01
sgd_convergence_interval : 4
sgd_convergence_threshold : 0.0
sgd_max_trial_iterations : 5
sgd_sampling_block_size : 131072
sgd_step_adjustment_interval : 4
sgd_step_size : 0.0
sgd_trial_sample_minimum_size : 10000
sgd_trial_sample_proportion : 0.125
step_size_decrease_rate : 0.75
additional_iterations_if_unhealthy: 5
adagrad_momentum_weighting : 0.9
num_tempering_iterations : 4
tempering_regularization_start_value: 0.0
track_exact_loss : 0
In [14]:
m['coefficients']
Out[14]:
{'intercept': 3.5821495005738013, 'movie_id': Columns:
movie_id int
linear_terms float
factors array
Rows: 3706
Data:
+----------+------------------+-------------------------------+
| movie_id | linear_terms | factors |
+----------+------------------+-------------------------------+
| 1193 | 1.06781125069 | [-0.119829073548, -0.02245... |
| 661 | -0.0261590108275 | [-0.727257788181, 0.016146... |
| 914 | 0.324085891247 | [-0.859803378582, 0.056376... |
| 3408 | 0.565778970718 | [0.334619760513, -0.014206... |
| 2355 | 0.648248255253 | [-0.248598009348, 0.103843... |
| 1197 | 1.12024652958 | [-0.100379563868, 0.085359... |
| 1287 | 0.345532894135 | [-0.247123196721, 0.024613... |
| 2804 | 0.894821941853 | [-0.272583067417, 0.046351... |
| 594 | 0.311594575644 | [-0.974369823933, 0.054282... |
| 919 | 0.97704321146 | [-0.598346889019, 0.085630... |
+----------+------------------+-------------------------------+
[3706 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'side_data': Columns:
feature str
index str
linear_terms float
factors array
Rows: 1
Data:
+-----------+-------+-----------------+-------------------------------+
| feature | index | linear_terms | factors |
+-----------+-------+-----------------+-------------------------------+
| timestamp | 0 | -0.116745471954 | [-0.564183712006, 1.267165... |
+-----------+-------+-----------------+-------------------------------+
[1 rows x 4 columns], 'user_id': Columns:
user_id int
linear_terms float
factors array
Rows: 6040
Data:
+---------+------------------+-------------------------------+
| user_id | linear_terms | factors |
+---------+------------------+-------------------------------+
| 1 | -0.027785371989 | [-0.0942558199167, 0.00739... |
| 2 | -0.0234720371664 | [0.015922004357, -0.033992... |
| 3 | -0.0345229320228 | [0.176564618945, -0.050576... |
| 4 | -0.0198582224548 | [-0.0773911848664, -0.0500... |
| 5 | -0.0562275871634 | [-0.0598151274025, -0.0059... |
| 6 | -0.0401206016541 | [0.0565584115684, 0.030123... |
| 7 | -0.0433877147734 | [0.205288589001, -0.060017... |
| 8 | -0.0184100158513 | [0.169030055404, -0.043373... |
| 9 | -0.0512112490833 | [0.163330376148, -0.060946... |
| 10 | -0.0407416447997 | [-0.420519113541, 0.110337... |
+---------+------------------+-------------------------------+
[6040 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}
In [16]:
graphlab.recommender.util.compare_models(test[test['rating'] > 4],
[pop, itemcf, m],
user_sample=0.2,
metric='precision_recall')
compare_models: using 183 users to estimate model performance
PROGRESS: Evaluate model M0
Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+-----------------+-----------------+
| 1 | 0.109289617486 | 0.0154169412131 |
| 2 | 0.114754098361 | 0.0315571827129 |
| 3 | 0.103825136612 | 0.0393550677194 |
| 4 | 0.0983606557377 | 0.0488860172488 |
| 5 | 0.0983606557377 | 0.057530354299 |
| 6 | 0.0983606557377 | 0.06952814808 |
| 7 | 0.0967993754879 | 0.0776744105871 |
| 8 | 0.0949453551913 | 0.0871933441083 |
| 9 | 0.0910746812386 | 0.0970583805009 |
| 10 | 0.0890710382514 | 0.105731522781 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]
PROGRESS: Evaluate model M1
Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+----------------+-----------------+
| 1 | 0.218579234973 | 0.0336356996991 |
| 2 | 0.185792349727 | 0.0491081612808 |
| 3 | 0.182149362477 | 0.0721856847862 |
| 4 | 0.172131147541 | 0.086814432767 |
| 5 | 0.165027322404 | 0.0983099175722 |
| 6 | 0.152094717668 | 0.110996252299 |
| 7 | 0.148321623731 | 0.13067735829 |
| 8 | 0.148907103825 | 0.150453968213 |
| 9 | 0.142076502732 | 0.158699171088 |
| 10 | 0.134426229508 | 0.166857542043 |
+--------+----------------+-----------------+
[10 rows x 3 columns]
PROGRESS: Evaluate model M2
Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+----------------+-----------------+
| 1 | 0.27868852459 | 0.0355923139267 |
| 2 | 0.226775956284 | 0.0540712203094 |
| 3 | 0.213114754098 | 0.0716753913564 |
| 4 | 0.198087431694 | 0.0898091945474 |
| 5 | 0.183606557377 | 0.100699809919 |
| 6 | 0.182149362477 | 0.11362028645 |
| 7 | 0.185011709602 | 0.137198290932 |
| 8 | 0.177595628415 | 0.147966304582 |
| 9 | 0.169398907104 | 0.156916738229 |
| 10 | 0.160655737705 | 0.171367047623 |
+--------+----------------+-----------------+
[10 rows x 3 columns]
Out[16]:
[{'precision_recall_by_user': Columns:
user_id int
cutoff int
precision float
recall float
count int
Rows: 3294
Data:
+---------+--------+----------------+----------------+-------+
| user_id | cutoff | precision | recall | count |
+---------+--------+----------------+----------------+-------+
| 42 | 1 | 0.0 | 0.0 | 7 |
| 42 | 2 | 0.5 | 0.142857142857 | 7 |
| 42 | 3 | 0.333333333333 | 0.142857142857 | 7 |
| 42 | 4 | 0.25 | 0.142857142857 | 7 |
| 42 | 5 | 0.2 | 0.142857142857 | 7 |
| 42 | 6 | 0.166666666667 | 0.142857142857 | 7 |
| 42 | 7 | 0.142857142857 | 0.142857142857 | 7 |
| 42 | 8 | 0.125 | 0.142857142857 | 7 |
| 42 | 9 | 0.111111111111 | 0.142857142857 | 7 |
| 42 | 10 | 0.1 | 0.142857142857 | 7 |
+---------+--------+----------------+----------------+-------+
[3294 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
'precision_recall_overall': Columns:
cutoff int
precision float
recall float
Rows: 18
Data:
+--------+-----------------+-----------------+
| cutoff | precision | recall |
+--------+-----------------+-----------------+
| 1 | 0.109289617486 | 0.0154169412131 |
| 2 | 0.114754098361 | 0.0315571827129 |
| 3 | 0.103825136612 | 0.0393550677194 |
| 4 | 0.0983606557377 | 0.0488860172488 |
| 5 | 0.0983606557377 | 0.057530354299 |
| 6 | 0.0983606557377 | 0.06952814808 |
| 7 | 0.0967993754879 | 0.0776744105871 |
| 8 | 0.0949453551913 | 0.0871933441083 |
| 9 | 0.0910746812386 | 0.0970583805009 |
| 10 | 0.0890710382514 | 0.105731522781 |
+--------+-----------------+-----------------+
[18 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.},
{'precision_recall_by_user': Columns:
user_id int
cutoff int
precision float
recall float
count int
Rows: 3294
Data:
+---------+--------+----------------+----------------+-------+
| user_id | cutoff | precision | recall | count |
+---------+--------+----------------+----------------+-------+
| 42 | 1 | 1.0 | 0.142857142857 | 7 |
| 42 | 2 | 0.5 | 0.142857142857 | 7 |
| 42 | 3 | 0.333333333333 | 0.142857142857 | 7 |
| 42 | 4 | 0.25 | 0.142857142857 | 7 |
| 42 | 5 | 0.4 | 0.285714285714 | 7 |
| 42 | 6 | 0.333333333333 | 0.285714285714 | 7 |
| 42 | 7 | 0.285714285714 | 0.285714285714 | 7 |
| 42 | 8 | 0.25 | 0.285714285714 | 7 |
| 42 | 9 | 0.222222222222 | 0.285714285714 | 7 |
| 42 | 10 | 0.2 | 0.285714285714 | 7 |
+---------+--------+----------------+----------------+-------+
[3294 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
'precision_recall_overall': Columns:
cutoff int
precision float
recall float
Rows: 18
Data:
+--------+----------------+-----------------+
| cutoff | precision | recall |
+--------+----------------+-----------------+
| 1 | 0.218579234973 | 0.0336356996991 |
| 2 | 0.185792349727 | 0.0491081612808 |
| 3 | 0.182149362477 | 0.0721856847862 |
| 4 | 0.172131147541 | 0.086814432767 |
| 5 | 0.165027322404 | 0.0983099175722 |
| 6 | 0.152094717668 | 0.110996252299 |
| 7 | 0.148321623731 | 0.13067735829 |
| 8 | 0.148907103825 | 0.150453968213 |
| 9 | 0.142076502732 | 0.158699171088 |
| 10 | 0.134426229508 | 0.166857542043 |
+--------+----------------+-----------------+
[18 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.},
{'precision_recall_by_user': Columns:
user_id int
cutoff int
precision float
recall float
count int
Rows: 3294
Data:
+---------+--------+----------------+----------------+-------+
| user_id | cutoff | precision | recall | count |
+---------+--------+----------------+----------------+-------+
| 42 | 1 | 1.0 | 0.142857142857 | 7 |
| 42 | 2 | 0.5 | 0.142857142857 | 7 |
| 42 | 3 | 0.333333333333 | 0.142857142857 | 7 |
| 42 | 4 | 0.25 | 0.142857142857 | 7 |
| 42 | 5 | 0.2 | 0.142857142857 | 7 |
| 42 | 6 | 0.166666666667 | 0.142857142857 | 7 |
| 42 | 7 | 0.142857142857 | 0.142857142857 | 7 |
| 42 | 8 | 0.125 | 0.142857142857 | 7 |
| 42 | 9 | 0.111111111111 | 0.142857142857 | 7 |
| 42 | 10 | 0.2 | 0.285714285714 | 7 |
+---------+--------+----------------+----------------+-------+
[3294 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
'precision_recall_overall': Columns:
cutoff int
precision float
recall float
Rows: 18
Data:
+--------+----------------+-----------------+
| cutoff | precision | recall |
+--------+----------------+-----------------+
| 1 | 0.27868852459 | 0.0355923139267 |
| 2 | 0.226775956284 | 0.0540712203094 |
| 3 | 0.213114754098 | 0.0716753913564 |
| 4 | 0.198087431694 | 0.0898091945474 |
| 5 | 0.183606557377 | 0.100699809919 |
| 6 | 0.182149362477 | 0.11362028645 |
| 7 | 0.185011709602 | 0.137198290932 |
| 8 | 0.177595628415 | 0.147966304582 |
| 9 | 0.169398907104 | 0.156916738229 |
| 10 | 0.160655737705 | 0.171367047623 |
+--------+----------------+-----------------+
[18 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}]
In [20]:
m_rank = graphlab.recommender.ranking_factorization_recommender.create(train, 'user_id', 'movie_id', 'rating',
unobserved_rating_value=3)
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 965508 observations with 6040 users and 3706 items.
Data prepared in: 0.910656s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| unobserved_rating_value | Ranking Target Rating for Unobserved Interacti...| 3 |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 16.6667 | Not Viable |
| 1 | 4.16667 | Not Viable |
| 2 | 1.04167 | Not Viable |
| 3 | 0.260417 | Not Viable |
| 4 | 0.0651042 | 0.998804 |
| 5 | 0.0325521 | 0.953073 |
| 6 | 0.016276 | 1.00856 |
| 7 | 0.00813802 | 1.0661 |
| 8 | 0.00406901 | 1.23488 |
+---------+-------------------+------------------------------------------+
| Final | 0.0325521 | 0.953073 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 107us | 1.33247 | 1.11699 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 885.253ms | 1.10069 | 1.00012 | 0.0325521 |
| 2 | 1.90s | 1.01103 | 0.956979 | 0.0325521 |
| 3 | 3.00s | 0.974312 | 0.937847 | 0.0325521 |
| 4 | 4.07s | 0.960712 | 0.931004 | 0.0325521 |
| 5 | 5.12s | 0.949761 | 0.9254 | 0.0325521 |
| 6 | 6.07s | 0.942225 | 0.921526 | 0.0325521 |
| 7 | 7.06s | 0.935704 | 0.918205 | 0.0325521 |
| 8 | 8.02s | 0.930567 | 0.915684 | 0.0325521 |
| 9 | 9.05s | 0.925405 | 0.912967 | 0.0325521 |
| 10 | 10.33s | 0.920952 | 0.910727 | 0.0325521 |
| 11 | 11.58s | 0.916647 | 0.908743 | 0.0325521 |
| 12 | 12.62s | 0.913399 | 0.907017 | 0.0325521 |
| 13 | 13.58s | 0.909575 | 0.904969 | 0.0325521 |
| 14 | 14.67s | 0.906824 | 0.90367 | 0.0325521 |
| 15 | 15.77s | 0.904054 | 0.902198 | 0.0325521 |
| 16 | 16.86s | 0.901294 | 0.90096 | 0.0325521 |
| 17 | 17.85s | 0.898579 | 0.899525 | 0.0325521 |
| 18 | 18.86s | 0.896474 | 0.898482 | 0.0325521 |
| 19 | 20.07s | 0.894312 | 0.897331 | 0.0325521 |
| 20 | 21.10s | 0.892068 | 0.896046 | 0.0325521 |
| 21 | 22.12s | 0.88988 | 0.894963 | 0.0325521 |
| 22 | 23.33s | 0.887669 | 0.893956 | 0.0325521 |
| 23 | 24.51s | 0.885674 | 0.892851 | 0.0325521 |
| 24 | 25.58s | 0.884228 | 0.892176 | 0.0325521 |
| 25 | 26.59s | 0.882557 | 0.891299 | 0.0325521 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 0.882406
Final training RMSE: 0.886832
In [21]:
results = graphlab.recommender.util.compare_models(test[test['rating'] > 4],
[pop, itemcf, m, m_rank],
user_sample=0.2,
metric='precision_recall')
compare_models: using 183 users to estimate model performance
PROGRESS: Evaluate model M0
Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+-----------------+-----------------+
| 1 | 0.103825136612 | 0.0116730652166 |
| 2 | 0.101092896175 | 0.0282821265133 |
| 3 | 0.0819672131148 | 0.0387227556041 |
| 4 | 0.0833333333333 | 0.054260741295 |
| 5 | 0.0786885245902 | 0.0652009814042 |
| 6 | 0.0765027322404 | 0.0758533021658 |
| 7 | 0.0772833723653 | 0.087870157321 |
| 8 | 0.0785519125683 | 0.10363872715 |
| 9 | 0.0740740740741 | 0.108125993351 |
| 10 | 0.0743169398907 | 0.125169969412 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]
PROGRESS: Evaluate model M1
Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+----------------+-----------------+
| 1 | 0.16393442623 | 0.033918559494 |
| 2 | 0.166666666667 | 0.0559035280067 |
| 3 | 0.158469945355 | 0.0802610557096 |
| 4 | 0.147540983607 | 0.0993935937697 |
| 5 | 0.138797814208 | 0.116006405262 |
| 6 | 0.128415300546 | 0.125824450712 |
| 7 | 0.128805620609 | 0.148368313836 |
| 8 | 0.125 | 0.162294248876 |
| 9 | 0.12204007286 | 0.173015991344 |
| 10 | 0.117486338798 | 0.18606052953 |
+--------+----------------+-----------------+
[10 rows x 3 columns]
PROGRESS: Evaluate model M2
Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+----------------+-----------------+
| 1 | 0.24043715847 | 0.032990686752 |
| 2 | 0.196721311475 | 0.0593716586723 |
| 3 | 0.187613843352 | 0.0783908312908 |
| 4 | 0.180327868852 | 0.102234190817 |
| 5 | 0.16393442623 | 0.120610724355 |
| 6 | 0.159380692168 | 0.140119556509 |
| 7 | 0.152224824356 | 0.157806327365 |
| 8 | 0.142759562842 | 0.166809497193 |
| 9 | 0.137826350941 | 0.175280850522 |
| 10 | 0.134972677596 | 0.186140806231 |
+--------+----------------+-----------------+
[10 rows x 3 columns]
PROGRESS: Evaluate model M3
Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+----------------+-----------------+
| 1 | 0.114754098361 | 0.0220448212961 |
| 2 | 0.120218579235 | 0.0351999242985 |
| 3 | 0.111111111111 | 0.0445415535372 |
| 4 | 0.106557377049 | 0.052975921608 |
| 5 | 0.110382513661 | 0.076614241225 |
| 6 | 0.111111111111 | 0.0956858236539 |
| 7 | 0.110850897736 | 0.113159656127 |
| 8 | 0.106557377049 | 0.121421484491 |
| 9 | 0.102610807529 | 0.136444329342 |
| 10 | 0.101092896175 | 0.14669712098 |
+--------+----------------+-----------------+
[10 rows x 3 columns]
In [28]:
results[3]['precision_recall_overall']
Out[28]:
cutoff
precision
recall
1
0.114754098361
0.0220448212961
2
0.120218579235
0.0351999242985
3
0.111111111111
0.0445415535372
4
0.106557377049
0.052975921608
5
0.110382513661
0.076614241225
6
0.111111111111
0.0956858236539
7
0.110850897736
0.113159656127
8
0.106557377049
0.121421484491
9
0.102610807529
0.136444329342
10
0.101092896175
0.14669712098
[18 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
In [29]:
user_sf = graphlab.SFrame('users')
item_sf = graphlab.SFrame('items')
In [30]:
m_user = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating',
user_data=user_sf)
m_item = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating',
item_data=item_sf)
m_both = graphlab.recommender.create(train, 'user_id', 'movie_id', 'rating',
user_data=user_sf, item_data=item_sf)
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 965508 observations with 6040 users and 3706 items.
Data prepared in: 0.872319s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| side_data_factorization | Assign Factors for Side Data | True |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 7.14286 | Not Viable |
| 1 | 1.78571 | Not Viable |
| 2 | 0.446429 | 1.52387 |
| 3 | 0.223214 | Not Viable |
| 4 | 0.0558036 | 1.79945 |
| 5 | 0.0279018 | 1.78969 |
+---------+-------------------+------------------------------------------+
| Final | 0.446429 | 1.52387 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 63us | 2.44592 | 1.11697 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 2.13s | DIVERGED | DIVERGED | 0.446429 |
| RESET | 2.99s | 2.44647 | 1.11698 | |
| 1 | 5.10s | 1.83529 | 1.10141 | 0.223214 |
| 2 | 6.90s | 1.47109 | 0.95455 | 0.223214 |
| 3 | 8.80s | 1.36577 | 0.917865 | 0.223214 |
| 4 | 10.62s | 1.31007 | 0.897384 | 0.223214 |
| 5 | 12.44s | 1.27125 | 0.883491 | 0.223214 |
| 6 | 14.67s | 1.24635 | 0.873351 | 0.223214 |
| 7 | 16.70s | 1.22622 | 0.865748 | 0.223214 |
| 8 | 18.63s | 1.21028 | 0.859754 | 0.223214 |
| 9 | 20.50s | 1.19796 | 0.854523 | 0.223214 |
| 10 | 22.46s | 1.18689 | 0.850478 | 0.223214 |
| 11 | 24.31s | 1.1783 | 0.846785 | 0.223214 |
| 12 | 26.18s | 1.17084 | 0.84352 | 0.223214 |
| 13 | 28.07s | 1.16355 | 0.840745 | 0.223214 |
| 14 | 29.97s | 1.15711 | 0.838398 | 0.223214 |
| 15 | 32.09s | 1.15247 | 0.836416 | 0.223214 |
| 16 | 34.04s | 1.14785 | 0.83443 | 0.223214 |
| 17 | 36.25s | 1.14331 | 0.832546 | 0.223214 |
| 18 | 38.41s | 1.13848 | 0.830724 | 0.223214 |
| 19 | 40.34s | 1.13683 | 0.82959 | 0.223214 |
| 20 | 42.39s | 1.13266 | 0.828052 | 0.223214 |
| 21 | 44.75s | 1.13008 | 0.827049 | 0.223214 |
| 22 | 47.05s | 1.12695 | 0.82589 | 0.223214 |
| 23 | 49.58s | 1.12374 | 0.824564 | 0.223214 |
| 24 | 51.93s | 1.12202 | 0.823962 | 0.223214 |
| 25 | 54.30s | 1.12015 | 0.822855 | 0.223214 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 1.12099
Final training RMSE: 0.797943
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 965508 observations with 6040 users and 3883 items.
Data prepared in: 1.16364s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| side_data_factorization | Assign Factors for Side Data | True |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 10 | Not Viable |
| 1 | 2.5 | Not Viable |
| 2 | 0.625 | Not Viable |
| 3 | 0.15625 | 1.08443 |
| 4 | 0.078125 | 1.72974 |
| 5 | 0.0390625 | 1.84654 |
| 6 | 0.0195312 | 1.74856 |
+---------+-------------------+------------------------------------------+
| Final | 0.15625 | 1.08443 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 70us | 2.44637 | 1.11698 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 2.33s | DIVERGED | DIVERGED | 0.15625 |
| RESET | 3.11s | 2.44644 | 1.11697 | |
| 1 | 4.92s | 1.75622 | 1.06161 | 0.078125 |
| 2 | 6.57s | 1.50614 | 0.966049 | 0.078125 |
| 3 | 8.08s | 1.4167 | 0.933697 | 0.078125 |
| 4 | 9.53s | 1.37077 | 0.917417 | 0.078125 |
| 5 | 10.99s | 1.34379 | 0.908108 | 0.078125 |
| 6 | 12.65s | 1.32151 | 0.899986 | 0.078125 |
| 7 | 14.14s | 1.30427 | 0.89374 | 0.078125 |
| 8 | 15.59s | 1.2895 | 0.888268 | 0.078125 |
| 9 | 17.03s | 1.27727 | 0.884087 | 0.078125 |
| 10 | 18.47s | 1.2662 | 0.879697 | 0.078125 |
| 11 | 19.91s | 1.25785 | 0.876548 | 0.078125 |
| 12 | 21.33s | 1.24893 | 0.873322 | 0.078125 |
| 13 | 22.73s | 1.24222 | 0.870908 | 0.078125 |
| 14 | 24.17s | 1.23693 | 0.868724 | 0.078125 |
| 15 | 25.64s | 1.23104 | 0.866697 | 0.078125 |
| 16 | 27.07s | 1.22657 | 0.865066 | 0.078125 |
| 17 | 28.47s | 1.22185 | 0.86311 | 0.078125 |
| 18 | 29.90s | 1.21603 | 0.860832 | 0.078125 |
| 19 | 31.34s | 1.21214 | 0.859841 | 0.078125 |
| 20 | 32.75s | 1.20866 | 0.858349 | 0.078125 |
| 21 | 34.17s | 1.20588 | 0.857265 | 0.078125 |
| 22 | 35.59s | 1.2013 | 0.855384 | 0.078125 |
| 23 | 36.98s | 1.19868 | 0.854415 | 0.078125 |
| 24 | 38.40s | 1.19618 | 0.8536 | 0.078125 |
| 25 | 39.82s | 1.19373 | 0.852524 | 0.078125 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 1.22299
Final training RMSE: 0.842974
Recsys training: model = ranking_factorization_recommender
Preparing data set.
Data has 965508 observations with 6040 users and 3883 items.
Data prepared in: 0.897359s
Training ranking_factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter | Description | Value |
+--------------------------------+--------------------------------------------------+----------+
| num_factors | Factor Dimension | 32 |
| regularization | L2 Regularization on Factors | 1e-09 |
| solver | Solver used for training | adagrad |
| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
| ranking_regularization | Rank-based Regularization Weight | 0.25 |
| side_data_factorization | Assign Factors for Side Data | True |
| max_iterations | Maximum Number of Iterations | 25 |
+--------------------------------+--------------------------------------------------+----------+
Optimizing model using SGD; tuning step size.
Using 120688 / 965508 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value |
+---------+-------------------+------------------------------------------+
| 0 | 5.55556 | Not Viable |
| 1 | 1.38889 | Not Viable |
| 2 | 0.347222 | 1.41163 |
| 3 | 0.173611 | 1.06475 |
| 4 | 0.0868056 | 1.03146 |
| 5 | 0.0434028 | 1.1885 |
| 6 | 0.0217014 | 1.54004 |
| 7 | 0.0108507 | 1.77557 |
+---------+-------------------+------------------------------------------+
| Final | 0.0868056 | 1.03146 |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 70us | 2.4467 | 1.117 | |
+---------+--------------+-------------------+-----------------------+-------------+
| 1 | 3.02s | DIVERGED | DIVERGED | 0.0868056 |
| RESET | 3.97s | 2.4468 | 1.11698 | |
| 1 | 6.22s | 1.56747 | 1.00211 | 0.0434028 |
| 2 | 8.40s | 1.4123 | 0.936364 | 0.0434028 |
| 3 | 10.61s | 1.34465 | 0.91251 | 0.0434028 |
| 4 | 13.00s | 1.30528 | 0.898029 | 0.0434028 |
| 5 | 15.21s | 1.27777 | 0.887646 | 0.0434028 |
| 6 | 18.09s | 1.25523 | 0.879139 | 0.0434028 |
| 7 | 20.54s | 1.23908 | 0.872961 | 0.0434028 |
| 8 | 22.79s | 1.22569 | 0.867368 | 0.0434028 |
| 9 | 25.25s | 1.21337 | 0.862705 | 0.0434028 |
| 10 | 28.35s | 1.20338 | 0.858906 | 0.0434028 |
| 11 | 31.26s | 1.19501 | 0.855631 | 0.0434028 |
| 12 | 33.70s | 1.18678 | 0.852674 | 0.0434028 |
| 13 | 35.86s | 1.18128 | 0.850198 | 0.0434028 |
| 14 | 38.05s | 1.17569 | 0.847772 | 0.0434028 |
| 15 | 40.25s | 1.16953 | 0.845739 | 0.0434028 |
| 16 | 42.45s | 1.16479 | 0.843867 | 0.0434028 |
| 17 | 44.62s | 1.16132 | 0.842166 | 0.0434028 |
| 18 | 46.80s | 1.15646 | 0.840656 | 0.0434028 |
| 19 | 48.95s | 1.15327 | 0.838994 | 0.0434028 |
| 20 | 51.21s | 1.15068 | 0.837869 | 0.0434028 |
| 21 | 53.40s | 1.14713 | 0.836618 | 0.0434028 |
| 22 | 55.60s | 1.14393 | 0.835341 | 0.0434028 |
| 23 | 57.78s | 1.14105 | 0.834355 | 0.0434028 |
| 24 | 59.92s | 1.13905 | 0.833327 | 0.0434028 |
| 25 | 1m 2s | 1.13617 | 0.832308 | 0.0434028 |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
Final objective value: 1.16266
Final training RMSE: 0.822791
In [31]:
m_both
Out[31]:
Class : RankingFactorizationRecommender
Schema
------
User ID : user_id
Item ID : movie_id
Target : rating
Additional observation features : 1
Number of user side features : 5
Number of item side features : 3
Statistics
----------
Number of observations : 965508
Number of users : 6040
Number of items : 3883
Training summary
----------------
Training time : 74.1749
Model Parameters
----------------
Model class : RankingFactorizationRecommender
num_factors : 32
binary_target : 0
side_data_factorization : 1
solver : auto
nmf : 0
max_iterations : 25
Regularization Settings
-----------------------
regularization : 0.0
regularization_type : normal
linear_regularization : 0.0
ranking_regularization : 0.25
unobserved_rating_value : -1.79769313486e+308
num_sampled_negative_examples : 4
ials_confidence_scaling_type : auto
ials_confidence_scaling_factor : 1
Optimization Settings
---------------------
init_random_sigma : 0.01
sgd_convergence_interval : 4
sgd_convergence_threshold : 0.0
sgd_max_trial_iterations : 5
sgd_sampling_block_size : 131072
sgd_step_adjustment_interval : 4
sgd_step_size : 0.0
sgd_trial_sample_minimum_size : 10000
sgd_trial_sample_proportion : 0.125
step_size_decrease_rate : 0.75
additional_iterations_if_unhealthy: 5
adagrad_momentum_weighting : 0.9
num_tempering_iterations : 4
tempering_regularization_start_value: 0.0
track_exact_loss : 0
In [32]:
results = graphlab.recommender.util.compare_models(test, [m, m_user, m_item, m_both], user_sample=0.2)
compare_models: using 200 users to estimate model performance
PROGRESS: Evaluate model M0
Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+----------------+-----------------+
| 1 | 0.305 | 0.0104911692762 |
| 2 | 0.3075 | 0.0218598413174 |
| 3 | 0.298333333333 | 0.0312188865716 |
| 4 | 0.28375 | 0.0416981002021 |
| 5 | 0.267 | 0.0490821420076 |
| 6 | 0.255 | 0.0566601620839 |
| 7 | 0.244285714286 | 0.0634873210823 |
| 8 | 0.2325 | 0.0671075588864 |
| 9 | 0.227222222222 | 0.0732309783629 |
| 10 | 0.22 | 0.0778109177835 |
+--------+----------------+-----------------+
[10 rows x 3 columns]
('\nOverall RMSE: ', 0.9976486246840836)
Per User RMSE (best)
+---------+-------+----------------+
| user_id | count | rmse |
+---------+-------+----------------+
| 4259 | 4 | 0.482071967797 |
+---------+-------+----------------+
[1 rows x 3 columns]
Per User RMSE (worst)
+---------+-------+---------------+
| user_id | count | rmse |
+---------+-------+---------------+
| 3275 | 4 | 1.92700170805 |
+---------+-------+---------------+
[1 rows x 3 columns]
Per Item RMSE (best)
+----------+-------+------------------+
| movie_id | count | rmse |
+----------+-------+------------------+
| 163 | 1 | 0.00169078452152 |
+----------+-------+------------------+
[1 rows x 3 columns]
Per Item RMSE (worst)
+----------+-------+---------------+
| movie_id | count | rmse |
+----------+-------+---------------+
| 3196 | 1 | 4.01836458102 |
+----------+-------+---------------+
[1 rows x 3 columns]
PROGRESS: Evaluate model M1
Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+----------------+-----------------+
| 1 | 0.375 | 0.0136968745991 |
| 2 | 0.3575 | 0.023755002765 |
| 3 | 0.335 | 0.0362605125879 |
| 4 | 0.3075 | 0.0423008826942 |
| 5 | 0.293 | 0.0497532350863 |
| 6 | 0.281666666667 | 0.0578575934232 |
| 7 | 0.282142857143 | 0.0671244350417 |
| 8 | 0.27125 | 0.0748468085545 |
| 9 | 0.262222222222 | 0.0813855604447 |
| 10 | 0.26 | 0.0896391608922 |
+--------+----------------+-----------------+
[10 rows x 3 columns]
('\nOverall RMSE: ', 1.0433027431346846)
Per User RMSE (best)
+---------+-------+---------------+
| user_id | count | rmse |
+---------+-------+---------------+
| 4259 | 4 | 0.24496854802 |
+---------+-------+---------------+
[1 rows x 3 columns]
Per User RMSE (worst)
+---------+-------+---------------+
| user_id | count | rmse |
+---------+-------+---------------+
| 2912 | 7 | 2.13582589594 |
+---------+-------+---------------+
[1 rows x 3 columns]
Per Item RMSE (best)
+----------+-------+------------------+
| movie_id | count | rmse |
+----------+-------+------------------+
| 379 | 1 | 0.00331903448671 |
+----------+-------+------------------+
[1 rows x 3 columns]
Per Item RMSE (worst)
+----------+-------+---------------+
| movie_id | count | rmse |
+----------+-------+---------------+
| 3117 | 1 | 3.61265547345 |
+----------+-------+---------------+
[1 rows x 3 columns]
PROGRESS: Evaluate model M2
Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+----------------+-----------------+
| 1 | 0.355 | 0.01355116837 |
| 2 | 0.345 | 0.0247731072461 |
| 3 | 0.328333333333 | 0.0346428767624 |
| 4 | 0.3 | 0.0417405134962 |
| 5 | 0.286 | 0.0496330466776 |
| 6 | 0.283333333333 | 0.0607078786391 |
| 7 | 0.273571428571 | 0.068217750301 |
| 8 | 0.265625 | 0.0745364058279 |
| 9 | 0.262777777778 | 0.0821126915135 |
| 10 | 0.255 | 0.0868626759608 |
+--------+----------------+-----------------+
[10 rows x 3 columns]
('\nOverall RMSE: ', 1.0121070979184972)
Per User RMSE (best)
+---------+-------+----------------+
| user_id | count | rmse |
+---------+-------+----------------+
| 4259 | 4 | 0.280887940151 |
+---------+-------+----------------+
[1 rows x 3 columns]
Per User RMSE (worst)
+---------+-------+---------------+
| user_id | count | rmse |
+---------+-------+---------------+
| 2912 | 7 | 2.21734756181 |
+---------+-------+---------------+
[1 rows x 3 columns]
Per Item RMSE (best)
+----------+-------+-------------------+
| movie_id | count | rmse |
+----------+-------+-------------------+
| 1283 | 1 | 0.000539099143753 |
+----------+-------+-------------------+
[1 rows x 3 columns]
Per Item RMSE (worst)
+----------+-------+---------------+
| movie_id | count | rmse |
+----------+-------+---------------+
| 1806 | 1 | 3.39939804512 |
+----------+-------+---------------+
[1 rows x 3 columns]
PROGRESS: Evaluate model M3
Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+----------------+-----------------+
| 1 | 0.4 | 0.0173755798355 |
| 2 | 0.3625 | 0.0265019144168 |
| 3 | 0.338333333333 | 0.0380391678111 |
| 4 | 0.3275 | 0.0479764485634 |
| 5 | 0.312 | 0.0576493349033 |
| 6 | 0.299166666667 | 0.0654828586353 |
| 7 | 0.295714285714 | 0.0760004175285 |
| 8 | 0.29 | 0.0841766126368 |
| 9 | 0.286666666667 | 0.093854767667 |
| 10 | 0.2755 | 0.0995420682121 |
+--------+----------------+-----------------+
[10 rows x 3 columns]
('\nOverall RMSE: ', 0.9936036664127302)
Per User RMSE (best)
+---------+-------+----------------+
| user_id | count | rmse |
+---------+-------+----------------+
| 4259 | 4 | 0.397063536039 |
+---------+-------+----------------+
[1 rows x 3 columns]
Per User RMSE (worst)
+---------+-------+---------------+
| user_id | count | rmse |
+---------+-------+---------------+
| 2912 | 7 | 2.01561012851 |
+---------+-------+---------------+
[1 rows x 3 columns]
Per Item RMSE (best)
+----------+-------+------------------+
| movie_id | count | rmse |
+----------+-------+------------------+
| 849 | 1 | 0.00179710693082 |
+----------+-------+------------------+
[1 rows x 3 columns]
Per Item RMSE (worst)
+----------+-------+---------------+
| movie_id | count | rmse |
+----------+-------+---------------+
| 3806 | 1 | 3.81246875723 |
+----------+-------+---------------+
[1 rows x 3 columns]
In [33]:
[results[i]['rmse_overall'] for i in range(len(results))]
Out[33]:
[0.9976486246840836,
1.0433027431346846,
1.0121070979184972,
0.9936036664127302]
In [34]:
results[0]['rmse_by_item'].show()
In [36]:
graphlab.recommender.util.compare_models(test[test['rating'] > 4],
[m_rank, m_both],
user_sample=0.2,
metric='precision_recall')
compare_models: using 183 users to estimate model performance
PROGRESS: Evaluate model M0
Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+----------------+-----------------+
| 1 | 0.158469945355 | 0.0235039678346 |
| 2 | 0.155737704918 | 0.0367981705878 |
| 3 | 0.136612021858 | 0.0436527349676 |
| 4 | 0.137978142077 | 0.0577237401993 |
| 5 | 0.136612021858 | 0.0723710331936 |
| 6 | 0.125683060109 | 0.0842016591703 |
| 7 | 0.124902419984 | 0.101563633319 |
| 8 | 0.118852459016 | 0.109983711543 |
| 9 | 0.114754098361 | 0.11908895883 |
| 10 | 0.111475409836 | 0.128536481539 |
+--------+----------------+-----------------+
[10 rows x 3 columns]
PROGRESS: Evaluate model M1
Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+----------------+-----------------+
| 1 | 0.245901639344 | 0.0538736282089 |
| 2 | 0.234972677596 | 0.0758295229523 |
| 3 | 0.23679417122 | 0.101520870674 |
| 4 | 0.225409836066 | 0.12569838792 |
| 5 | 0.209836065574 | 0.142998857028 |
| 6 | 0.192167577413 | 0.155246605969 |
| 7 | 0.183450429352 | 0.174214946795 |
| 8 | 0.176229508197 | 0.190227369104 |
| 9 | 0.165148755313 | 0.200596309693 |
| 10 | 0.158469945355 | 0.210908575339 |
+--------+----------------+-----------------+
[10 rows x 3 columns]
Out[36]:
[{'precision_recall_by_user': Columns:
user_id int
cutoff int
precision float
recall float
count int
Rows: 3294
Data:
+---------+--------+-----------+--------+-------+
| user_id | cutoff | precision | recall | count |
+---------+--------+-----------+--------+-------+
| 3 | 1 | 0.0 | 0.0 | 2 |
| 3 | 2 | 0.0 | 0.0 | 2 |
| 3 | 3 | 0.0 | 0.0 | 2 |
| 3 | 4 | 0.0 | 0.0 | 2 |
| 3 | 5 | 0.0 | 0.0 | 2 |
| 3 | 6 | 0.0 | 0.0 | 2 |
| 3 | 7 | 0.0 | 0.0 | 2 |
| 3 | 8 | 0.0 | 0.0 | 2 |
| 3 | 9 | 0.0 | 0.0 | 2 |
| 3 | 10 | 0.0 | 0.0 | 2 |
+---------+--------+-----------+--------+-------+
[3294 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
'precision_recall_overall': Columns:
cutoff int
precision float
recall float
Rows: 18
Data:
+--------+----------------+-----------------+
| cutoff | precision | recall |
+--------+----------------+-----------------+
| 1 | 0.158469945355 | 0.0235039678346 |
| 2 | 0.155737704918 | 0.0367981705878 |
| 3 | 0.136612021858 | 0.0436527349676 |
| 4 | 0.137978142077 | 0.0577237401993 |
| 5 | 0.136612021858 | 0.0723710331936 |
| 6 | 0.125683060109 | 0.0842016591703 |
| 7 | 0.124902419984 | 0.101563633319 |
| 8 | 0.118852459016 | 0.109983711543 |
| 9 | 0.114754098361 | 0.11908895883 |
| 10 | 0.111475409836 | 0.128536481539 |
+--------+----------------+-----------------+
[18 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.},
{'precision_recall_by_user': Columns:
user_id int
cutoff int
precision float
recall float
count int
Rows: 3294
Data:
+---------+--------+-----------+--------+-------+
| user_id | cutoff | precision | recall | count |
+---------+--------+-----------+--------+-------+
| 3 | 1 | 0.0 | 0.0 | 2 |
| 3 | 2 | 0.0 | 0.0 | 2 |
| 3 | 3 | 0.0 | 0.0 | 2 |
| 3 | 4 | 0.0 | 0.0 | 2 |
| 3 | 5 | 0.0 | 0.0 | 2 |
| 3 | 6 | 0.0 | 0.0 | 2 |
| 3 | 7 | 0.0 | 0.0 | 2 |
| 3 | 8 | 0.0 | 0.0 | 2 |
| 3 | 9 | 0.0 | 0.0 | 2 |
| 3 | 10 | 0.0 | 0.0 | 2 |
+---------+--------+-----------+--------+-------+
[3294 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
'precision_recall_overall': Columns:
cutoff int
precision float
recall float
Rows: 18
Data:
+--------+----------------+-----------------+
| cutoff | precision | recall |
+--------+----------------+-----------------+
| 1 | 0.245901639344 | 0.0538736282089 |
| 2 | 0.234972677596 | 0.0758295229523 |
| 3 | 0.23679417122 | 0.101520870674 |
| 4 | 0.225409836066 | 0.12569838792 |
| 5 | 0.209836065574 | 0.142998857028 |
| 6 | 0.192167577413 | 0.155246605969 |
| 7 | 0.183450429352 | 0.174214946795 |
| 8 | 0.176229508197 | 0.190227369104 |
| 9 | 0.165148755313 | 0.200596309693 |
| 10 | 0.158469945355 | 0.210908575339 |
+--------+----------------+-----------------+
[18 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}]
In [ ]:
fm = graphlab.recommender.create(train.head(10000), 'user_id', 'movie_id', 'rating',
method='factorization_model',
item_data=item_sf,
sgd_step_size=0.09,
max_iterations=10)
In [ ]:
Content source: Aggieyixin/cjc2016
Similar notebooks: