In [1]:
import graphlab as gl
gl.canvas.set_target("ipynb")
In [2]:
implicit = gl.SFrame('implicit')
explicit = gl.SFrame('explicit')
items = gl.SFrame('items')
ratings = gl.SFrame('ratings')
[INFO] This commercial license of GraphLab Create is assigned to engr@turi.com.
[INFO] Start server at: ipc:///tmp/graphlab_server-41454 - Server binary: /Users/chris/miniconda/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1443481858.log
[INFO] GraphLab Server Version: 1.6.1
In [3]:
ratings.show()
This allows us to evaluate generalization ability.
In [4]:
train, valid = gl.recommender.util.random_split_by_user(implicit)
Compute the number of times each item has been rated.
In [5]:
num_ratings_per_item = train.groupby('item_id', {'num_users': gl.aggregate.COUNT})
items = items.join(num_ratings_per_item, on='item_id')
Transform the count into a categorical variable using the feature_engineering
module.
In [6]:
binner = gl.feature_engineering.FeatureBinner(features=['num_users'], strategy='logarithmic', num_bins=5)
items = binner.fit_transform(items)
Convert each genre element into a dictionary and each year to an integer.
In [7]:
items['genres'] = items['genres'].apply(lambda x: {k:1 for k in x})
items['year'] = items['year'].astype(int)
In [8]:
items
Out[8]:
item_id
genres
title
year
num_users
1
{"Children's": 1,
'Comedy': 1, 'Animati ...
Toy Story
1995
(1000.000, Inf]
2
{"Children's": 1,
'Adventure': 1, ...
Jumanji
1995
(100.000, 1000.000]
3
{'Romance': 1, 'Comedy':
1} ...
Grumpier Old Men
1995
(100.000, 1000.000]
4
{'Drama': 1, 'Comedy': 1}
Waiting to Exhale
1995
(10.000, 100.000]
5
{'Comedy': 1}
Father of the Bride Part
II ...
1995
(10.000, 100.000]
6
{'Action': 1, 'Thriller':
1, 'Crime': 1} ...
Heat
1995
(100.000, 1000.000]
7
{'Romance': 1, 'Comedy':
1} ...
Sabrina
1995
(100.000, 1000.000]
8
{"Children's": 1,
'Adventure': 1} ...
Tom and Huck
1995
(10.000, 100.000]
9
{'Action': 1}
Sudden Death
1995
(10.000, 100.000]
10
{'Action': 1,
'Adventure': 1, ...
GoldenEye
1995
(100.000, 1000.000]
[3526 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
In [9]:
m0 = gl.item_similarity_recommender.create(train)
PROGRESS: Recsys training: model = item_similarity
PROGRESS: Warning: Column 'score' ignored.
PROGRESS: To use this column as the target, set target = "score" and use a method that allows the use of a target.
PROGRESS: Preparing data set.
PROGRESS: Data has 555786 observations with 6038 users and 3526 items.
PROGRESS: Data prepared in: 0.498567s
PROGRESS: Computing item similarity statistics:
PROGRESS: Computing most similar items for 3526 items:
PROGRESS: +-----------------+-----------------+
PROGRESS: | Number of items | Elapsed Time |
PROGRESS: +-----------------+-----------------+
PROGRESS: | 1000 | 1.64295 |
PROGRESS: | 2000 | 1.71588 |
PROGRESS: | 3000 | 1.82574 |
PROGRESS: +-----------------+-----------------+
PROGRESS: Finished training in 2.05003s
In [10]:
m1 = gl.ranking_factorization_recommender.create(train, max_iterations=10)
PROGRESS: Recsys training: model = ranking_factorization_recommender
PROGRESS: Preparing data set.
PROGRESS: Data has 555786 observations with 6038 users and 3526 items.
PROGRESS: Data prepared in: 0.755589s
PROGRESS: Training ranking_factorization_recommender for recommendations.
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | Parameter | Description | Value |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | num_factors | Factor Dimension | 32 |
PROGRESS: | regularization | L2 Regularization on Factors | 1e-09 |
PROGRESS: | solver | Solver used for training | adagrad |
PROGRESS: | linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
PROGRESS: | binary_target | Assume Binary Targets | True |
PROGRESS: | max_iterations | Maximum Number of Iterations | 10 |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: Optimizing model using SGD; tuning step size.
PROGRESS: Using 69473 / 555786 points for tuning the step size.
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Attempt | Initial Step Size | Estimated Objective Value |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | 0 | 16.6667 | Not Viable |
PROGRESS: | 1 | 4.16667 | Not Viable |
PROGRESS: | 2 | 1.04167 | Not Viable |
PROGRESS: | 3 | 0.260417 | Not Viable |
PROGRESS: | 4 | 0.0651042 | No Decrease (1.51234 >= 1.38646) |
PROGRESS: | 5 | 0.016276 | 1.34978 |
PROGRESS: | 6 | 0.00813802 | 1.35679 |
PROGRESS: | 7 | 0.00406901 | 1.36648 |
PROGRESS: | 8 | 0.00203451 | 1.37248 |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Final | 0.016276 | 1.34978 |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: Starting Optimization.
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Iter. | Elapsed Time | Approx. Objective | Approx. Training Predictive Error | Step Size |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Initial | 423us | 1.38646 | 0.69317 | |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | 1 | 1.70s | 1.33478 | 0.649814 | 0.016276 |
PROGRESS: | 2 | 3.19s | 1.30837 | 0.643913 | 0.016276 |
PROGRESS: | 3 | 4.46s | 1.29902 | 0.643703 | 0.016276 |
PROGRESS: | 4 | 5.70s | 1.29397 | 0.643084 | 0.016276 |
PROGRESS: | 5 | 6.98s | 1.28905 | 0.642419 | 0.016276 |
PROGRESS: | 6 | 8.32s | 1.2865 | 0.641463 | 0.016276 |
PROGRESS: | 7 | 9.90s | 1.28333 | 0.64064 | 0.016276 |
PROGRESS: | 8 | 11.75s | 1.28209 | 0.640611 | 0.016276 |
PROGRESS: | 9 | 13.07s | 1.28062 | 0.64002 | 0.016276 |
PROGRESS: | 10 | 14.36s | 1.27902 | 0.639188 | 0.016276 |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: Optimization Complete: Maximum number of passes through the data reached.
PROGRESS: Computing final objective value and training Predictive Error.
PROGRESS: Final objective value: 1.28501
PROGRESS: Final training Predictive Error: 0.636185
In [11]:
m2 = gl.ranking_factorization_recommender.create(train,
item_data=items[['item_id', 'year']],
max_iterations=10)
PROGRESS: Recsys training: model = ranking_factorization_recommender
PROGRESS: Preparing data set.
PROGRESS: Data has 555786 observations with 6038 users and 3526 items.
PROGRESS: Data prepared in: 0.764826s
PROGRESS: Training ranking_factorization_recommender for recommendations.
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | Parameter | Description | Value |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | num_factors | Factor Dimension | 32 |
PROGRESS: | regularization | L2 Regularization on Factors | 1e-09 |
PROGRESS: | solver | Solver used for training | adagrad |
PROGRESS: | linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
PROGRESS: | binary_target | Assume Binary Targets | True |
PROGRESS: | side_data_factorization | Assign Factors for Side Data | True |
PROGRESS: | max_iterations | Maximum Number of Iterations | 10 |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: Optimizing model using SGD; tuning step size.
PROGRESS: Using 69473 / 555786 points for tuning the step size.
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Attempt | Initial Step Size | Estimated Objective Value |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | 0 | 12.5 | Not Viable |
PROGRESS: | 1 | 3.125 | Not Viable |
PROGRESS: | 2 | 0.78125 | Not Viable |
PROGRESS: | 3 | 0.195312 | Not Viable |
PROGRESS: | 4 | 0.0488281 | No Decrease (2.55 >= 1.38646) |
PROGRESS: | 5 | 0.012207 | No Decrease (1.48073 >= 1.38646) |
PROGRESS: | 6 | 0.00305176 | No Decrease (1.40179 >= 1.38646) |
PROGRESS: | 7 | 0.000762939 | No Decrease (1.40125 >= 1.38646) |
PROGRESS: | 8 | 0.000190735 | 1.38602 |
PROGRESS: | 9 | 9.53674e-05 | 1.38607 |
PROGRESS: | 10 | 4.76837e-05 | 1.38616 |
PROGRESS: | 11 | 2.38419e-05 | 1.38622 |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Final | 0.000190735 | 1.38602 |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: Starting Optimization.
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Iter. | Elapsed Time | Approx. Objective | Approx. Training Predictive Error | Step Size |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Initial | 451us | 1.38646 | 0.693157 | |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | 1 | 1.83s | 1.38566 | 0.691639 | 0.000190735 |
PROGRESS: | 2 | 3.45s | 1.38585 | 0.690663 | 0.000190735 |
PROGRESS: | 3 | 5.01s | 1.38662 | 0.690082 | 0.000190735 |
PROGRESS: | 4 | 6.52s | 1.38766 | 0.689626 | 0.000190735 |
PROGRESS: | 5 | 8.13s | 1.38905 | 0.689328 | 0.000190735 |
PROGRESS: | 6 | 9.63s | 1.39072 | 0.689181 | 0.000190735 |
PROGRESS: | 7 | 11.13s | 1.39268 | 0.689199 | 0.000190735 |
PROGRESS: | 8 | 13.41s | 1.39486 | 0.689349 | 0.000190735 |
PROGRESS: | 9 | 15.68s | 1.39705 | 0.689573 | 0.000190735 |
PROGRESS: | 10 | 17.93s | 1.39941 | 0.689905 | 0.000190735 |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: Optimization Complete: Maximum number of passes through the data reached.
PROGRESS: Computing final objective value and training Predictive Error.
PROGRESS: Final objective value: 1.401
PROGRESS: Final training Predictive Error: 0.690105
In [12]:
m3 = gl.ranking_factorization_recommender.create(train,
item_data=items[['item_id', 'year', 'genres']],
max_iterations=10)
PROGRESS: Recsys training: model = ranking_factorization_recommender
PROGRESS: Preparing data set.
PROGRESS: Data has 555786 observations with 6038 users and 3526 items.
PROGRESS: Data prepared in: 0.648528s
PROGRESS: Training ranking_factorization_recommender for recommendations.
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | Parameter | Description | Value |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | num_factors | Factor Dimension | 32 |
PROGRESS: | regularization | L2 Regularization on Factors | 1e-09 |
PROGRESS: | solver | Solver used for training | adagrad |
PROGRESS: | linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
PROGRESS: | binary_target | Assume Binary Targets | True |
PROGRESS: | side_data_factorization | Assign Factors for Side Data | True |
PROGRESS: | max_iterations | Maximum Number of Iterations | 10 |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: Optimizing model using SGD; tuning step size.
PROGRESS: Using 69473 / 555786 points for tuning the step size.
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Attempt | Initial Step Size | Estimated Objective Value |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | 0 | 10 | Not Viable |
PROGRESS: | 1 | 2.5 | Not Viable |
PROGRESS: | 2 | 0.625 | Not Viable |
PROGRESS: | 3 | 0.15625 | Not Viable |
PROGRESS: | 4 | 0.0390625 | No Decrease (5.02175 >= 1.38651) |
PROGRESS: | 5 | 0.00976562 | 1.33702 |
PROGRESS: | 6 | 0.00488281 | No Decrease (1.49971 >= 1.38651) |
PROGRESS: | 7 | 0.0012207 | No Decrease (1.41472 >= 1.38651) |
PROGRESS: | 8 | 0.000305176 | No Decrease (1.38711 >= 1.38651) |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Final | 0.00976562 | 1.33702 |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: Starting Optimization.
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Iter. | Elapsed Time | Approx. Objective | Approx. Training Predictive Error | Step Size |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Initial | 420us | 1.3865 | 0.693066 | |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | 1 | 2.39s | 1.67735 | 0.863704 | 0.00976562 |
PROGRESS: | 2 | 4.58s | 2.26991 | 1.39687 | 0.00976562 |
PROGRESS: | 3 | 6.85s | 2.65462 | 1.71702 | 0.00976562 |
PROGRESS: | 4 | 11.47s | DIVERGED | DIVERGED | 0.00976562 |
PROGRESS: | RESET | 13.13s | 1.38655 | 0.693141 | |
PROGRESS: | 1 | 15.65s | 1.45915 | 0.716086 | 0.00488281 |
PROGRESS: | 2 | 18.06s | 1.5529 | 0.843311 | 0.00488281 |
PROGRESS: | 3 | 20.77s | 1.66115 | 0.963798 | 0.00488281 |
PROGRESS: | 4 | 22.90s | 1.72651 | 1.03845 | 0.00488281 |
PROGRESS: | 5 | 26.54s | DIVERGED | DIVERGED | 0.00488281 |
PROGRESS: | RESET | 27.45s | 1.3865 | 0.693157 | |
PROGRESS: | 1 | 29.61s | 1.48426 | 0.694865 | 0.00244141 |
PROGRESS: | 2 | 32.15s | 1.54694 | 0.769582 | 0.00244141 |
PROGRESS: | 3 | 34.43s | 1.55958 | 0.79954 | 0.00244141 |
PROGRESS: | 4 | 37.21s | 1.58258 | 0.830171 | 0.00244141 |
PROGRESS: | 5 | 41.71s | DIVERGED | DIVERGED | 0.00244141 |
PROGRESS: | RESET | 42.69s | 1.38653 | 0.693114 | |
PROGRESS: | 1 | 45.29s | 1.40832 | 0.674002 | 0.0012207 |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: Optimization Complete: Maximum number of passes through the data reached (hard limit).
PROGRESS: Computing final objective value and training Predictive Error.
PROGRESS: Final objective value: 1.43323
PROGRESS: Final training Predictive Error: 0.685329
Create a nearest neighbor model that uses the genres in common and the year of the movie.
In [14]:
dist = [[['genres'], 'jaccard', 1.0],
[['year'], 'euclidean', 1.0]]
nn_model = gl.nearest_neighbors.create(items, 'item_id', composite_params=dist)
Defaulting to brute force instead of ball tree because there are multiple distance components.
PROGRESS: Starting brute force nearest neighbors model training.
Compute a nearest neighbor graph.
In [15]:
similar = nn_model.query(items, 'item_id', k=100)\
.rename({'query_label': 'item_id', 'reference_label': 'similar', 'distance': 'score'})\
.join(items[['item_id', 'title']], on='item_id')\
.join(items[['item_id', 'title']], on={'similar': 'item_id'})
similar['score'] = 1 - similar['score']
similar.print_rows(100, max_row_width=200)
PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 1 | 3526 | 0.0283607 | 33.558ms |
PROGRESS: | 192 | 676992 | 5.44526 | 1.03s |
PROGRESS: | 380 | 1339880 | 10.7771 | 2.04s |
PROGRESS: | 569 | 2006294 | 16.1373 | 3.04s |
PROGRESS: | 776 | 2736176 | 22.0079 | 4.03s |
PROGRESS: | 990 | 3490740 | 28.0771 | 5.04s |
PROGRESS: | 1190 | 4195940 | 33.7493 | 6.04s |
PROGRESS: | 1387 | 4890562 | 39.3364 | 7.04s |
PROGRESS: | 1572 | 5542872 | 44.5831 | 8.05s |
PROGRESS: | 1761 | 6209286 | 49.9433 | 9.04s |
PROGRESS: | 1967 | 6935642 | 55.7856 | 10.05s |
PROGRESS: | 2163 | 7626738 | 61.3443 | 11.05s |
PROGRESS: | 2361 | 8324886 | 66.9597 | 12.05s |
PROGRESS: | 2563 | 9037138 | 72.6886 | 13.05s |
PROGRESS: | 2744 | 9675344 | 77.8219 | 14.05s |
PROGRESS: | 2883 | 1e+07 | 81.764 | 15.05s |
PROGRESS: | 3044 | 1.1e+07 | 86.3301 | 16.07s |
PROGRESS: | 3246 | 1.1e+07 | 92.059 | 17.05s |
PROGRESS: | 3436 | 1.2e+07 | 97.4475 | 18.06s |
PROGRESS: | Done | | 100 | 18.58s |
PROGRESS: +--------------+---------+-------------+--------------+
+---------+---------+----------------+------+-----------+---------------------+
| item_id | similar | score | rank | title | title.1 |
+---------+---------+----------------+------+-----------+---------------------+
| 1 | 1 | 1.0 | 1 | Toy Story | Toy Story |
| 1 | 3114 | -5.0 | 2 | Toy Story | Toy Story 2 |
| 1 | 34 | -8.5 | 3 | Toy Story | Babe |
| 1 | 110 | -9.0 | 4 | Toy Story | Braveheart |
| 1 | 608 | -9.0 | 5 | Toy Story | Fargo |
| 1 | 356 | -10.8 | 6 | Toy Story | Forrest Gump |
| 1 | 32 | -11.0 | 7 | Toy Story | Twelve Monkeys |
| 1 | 296 | -11.0 | 8 | Toy Story | Pulp Fiction |
| 1 | 2599 | -11.6666666667 | 9 | Toy Story | Election |
| 1 | 1265 | -11.75 | 10 | Toy Story | Groundhog Day |
| 1 | 3578 | -12.0 | 11 | Toy Story | Gladiator |
| 1 | 1580 | -12.8333333333 | 12 | Toy Story | Men in Black |
| 1 | 333 | -13.6666666667 | 13 | Toy Story | Tommy Boy |
| 1 | 21 | -13.8 | 14 | Toy Story | Get Shorty |
| 1 | 3175 | -13.8 | 15 | Toy Story | Galaxy Quest |
| 1 | 151 | -14.0 | 16 | Toy Story | Rob Roy |
| 1 | 480 | -14.0 | 17 | Toy Story | Jurassic Park |
| 1 | 1036 | -14.0 | 18 | Toy Story | Die Hard |
| 1 | 1213 | -14.0 | 19 | Toy Story | GoodFellas |
| 1 | 2355 | -14.0 | 20 | Toy Story | Bug's Life, A |
| 1 | 2916 | -14.0 | 21 | Toy Story | Total Recall |
| 1 | 2959 | -14.0 | 22 | Toy Story | Fight Club |
| 1 | 457 | -15.0 | 23 | Toy Story | Fugitive, The |
| 1 | 800 | -15.0 | 24 | Toy Story | Lone Star |
| 1 | 1872 | -15.0 | 25 | Toy Story | Go Now |
| 1 | 2571 | -15.0 | 26 | Toy Story | Matrix, The |
| 1 | 13 | -15.3333333333 | 27 | Toy Story | Balto |
| 1 | 174 | -15.6666666667 | 28 | Toy Story | Jury Duty |
| 1 | 743 | -15.6666666667 | 29 | Toy Story | Spy Hard |
| 1 | 45 | -15.75 | 30 | Toy Story | To Die For |
| 1 | 2797 | -15.75 | 31 | Toy Story | Big |
| 1 | 327 | -15.8333333333 | 32 | Toy Story | Tank Girl |
| 1 | 18 | -16.0 | 33 | Toy Story | Four Rooms |
| 1 | 22 | -16.0 | 34 | Toy Story | Copycat |
| 1 | 24 | -16.0 | 35 | Toy Story | Powder |
| 1 | 50 | -16.0 | 36 | Toy Story | Usual Suspects, The |
| 1 | 67 | -16.0 | 37 | Toy Story | Two Bits |
| 1 | 145 | -16.0 | 38 | Toy Story | Bad Boys |
| 1 | 160 | -16.0 | 39 | Toy Story | Congo |
| 1 | 179 | -16.0 | 40 | Toy Story | Mad Love |
| 1 | 194 | -16.0 | 41 | Toy Story | Smoke |
| 1 | 279 | -16.0 | 42 | Toy Story | My Family |
| 1 | 331 | -16.0 | 43 | Toy Story | Tom & Viv |
| 1 | 388 | -16.0 | 44 | Toy Story | Boys Life |
| 1 | 553 | -16.0 | 45 | Toy Story | Tombstone |
| 1 | 736 | -16.0 | 46 | Toy Story | Twister |
| 1 | 850 | -16.0 | 47 | Toy Story | Cyclo |
| 1 | 1138 | -16.0 | 48 | Toy Story | Dadetown |
| 1 | 1704 | -16.0 | 49 | Toy Story | Good Will Hunting |
| 1 | 69 | -16.6666666667 | 50 | Toy Story | Friday |
| 1 | 171 | -16.6666666667 | 51 | Toy Story | Jeffrey |
| 1 | 187 | -16.6666666667 | 52 | Toy Story | Party Girl |
| 1 | 102 | -16.6666666667 | 53 | Toy Story | Mr. Wrong |
| 1 | 411 | -16.6666666667 | 54 | Toy Story | You So Crazy |
| 1 | 505 | -16.6666666667 | 55 | Toy Story | North |
| 1 | 1414 | -16.6666666667 | 56 | Toy Story | Mother |
| 1 | 158 | -16.75 | 57 | Toy Story | Casper |
| 1 | 235 | -16.75 | 58 | Toy Story | Ed Wood |
| 1 | 256 | -16.75 | 59 | Toy Story | Junior |
| 1 | 289 | -16.75 | 60 | Toy Story | Only You |
| 1 | 304 | -16.75 | 61 | Toy Story | Roommates |
| 1 | 550 | -16.75 | 62 | Toy Story | Threesome |
| 1 | 852 | -16.75 | 63 | Toy Story | Tin Cup |
| 1 | 1784 | -16.75 | 64 | Toy Story | As Good As It Gets |
| 1 | 2108 | -16.75 | 65 | Toy Story | L.A. Story |
| 1 | 2858 | -16.75 | 66 | Toy Story | American Beauty |
| 1 | 6 | -17.0 | 67 | Toy Story | Heat |
| 1 | 10 | -17.0 | 68 | Toy Story | GoldenEye |
| 1 | 14 | -17.0 | 69 | Toy Story | Nixon |
| 1 | 16 | -17.0 | 70 | Toy Story | Casino |
| 1 | 20 | -17.0 | 71 | Toy Story | Money Train |
| 1 | 26 | -17.0 | 72 | Toy Story | Othello |
| 1 | 76 | -17.0 | 73 | Toy Story | Screamers |
| 1 | 77 | -17.0 | 74 | Toy Story | Nico Icon |
| 1 | 159 | -17.0 | 75 | Toy Story | Clockers |
| 1 | 170 | -17.0 | 76 | Toy Story | Hackers |
| 1 | 190 | -17.0 | 77 | Toy Story | Safe |
| 1 | 208 | -17.0 | 78 | Toy Story | Waterworld |
| 1 | 227 | -17.0 | 79 | Toy Story | Drop Zone |
| 1 | 240 | -17.0 | 80 | Toy Story | Hideaway |
| 1 | 297 | -17.0 | 81 | Toy Story | Panther |
| 1 | 300 | -17.0 | 82 | Toy Story | Quiz Show |
| 1 | 379 | -17.0 | 83 | Toy Story | Timecop |
| 1 | 425 | -17.0 | 84 | Toy Story | Blue Sky |
| 1 | 461 | -17.0 | 85 | Toy Story | Go Fish |
| 1 | 527 | -17.0 | 86 | Toy Story | Schindler's List |
| 1 | 692 | -17.0 | 87 | Toy Story | Solo |
| 1 | 695 | -17.0 | 88 | Toy Story | True Crime |
| 1 | 742 | -17.0 | 89 | Toy Story | Thinner |
| 1 | 764 | -17.0 | 90 | Toy Story | Heavy |
| 1 | 835 | -17.0 | 91 | Toy Story | Foxfire |
| 1 | 846 | -17.0 | 92 | Toy Story | Flirt |
| 1 | 1168 | -17.0 | 93 | Toy Story | Bad Moon |
| 1 | 1545 | -17.0 | 94 | Toy Story | Ponette |
| 1 | 1552 | -17.0 | 95 | Toy Story | Con Air |
| 1 | 1617 | -17.0 | 96 | Toy Story | L.A. Confidential |
| 1 | 1842 | -17.0 | 97 | Toy Story | Illtown |
| 1 | 1151 | -17.5 | 98 | Toy Story | Faust |
| 1 | 48 | -17.6 | 99 | Toy Story | Pocahontas |
| 1 | 65 | -17.6666666667 | 100 | Toy Story | Bio-Dome |
+---------+---------+----------------+------+-----------+---------------------+
[352600 rows x 6 columns]
Use this similarity data as the basis for a recommender.
In [16]:
m5 = gl.item_similarity_recommender.create(train, nearest_items=similar)
PROGRESS: Recsys training: model = item_similarity
PROGRESS: Warning: Column 'score' ignored.
PROGRESS: To use this column as the target, set target = "score" and use a method that allows the use of a target.
PROGRESS: Preparing data set.
PROGRESS: Loading user-provided nearest items.
PROGRESS: Data has 555786 observations with 6038 users and 3526 items.
PROGRESS: Data prepared in: 1.2663s
Create a precision/recall plot to compare the recommendation quality of the above models given our heldout data.
In [19]:
model_comparison = gl.compare(valid, [m0, m1, m2, m3, m5], user_sample=.3)
compare_models: using 298 users to estimate model performance
PROGRESS: Evaluate model M0
Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+----------------+-----------------+
| 1 | 0.281879194631 | 0.0263711769818 |
| 2 | 0.263422818792 | 0.0461998105304 |
| 3 | 0.244966442953 | 0.062829557057 |
| 4 | 0.235738255034 | 0.0777983759123 |
| 5 | 0.225503355705 | 0.0882340267173 |
| 6 | 0.209731543624 | 0.0951448587842 |
| 7 | 0.200383509108 | 0.104329520094 |
| 8 | 0.192953020134 | 0.112424305266 |
| 9 | 0.186428038777 | 0.120099746396 |
| 10 | 0.180536912752 | 0.128577958505 |
+--------+----------------+-----------------+
[10 rows x 3 columns]
PROGRESS: Evaluate model M1
Precision and recall summary statistics by cutoff
+--------+----------------+------------------+
| cutoff | mean_precision | mean_recall |
+--------+----------------+------------------+
| 1 | 0.134228187919 | 0.00957333909485 |
| 2 | 0.151006711409 | 0.025808351569 |
| 3 | 0.156599552573 | 0.0351607762475 |
| 4 | 0.146812080537 | 0.0403875964733 |
| 5 | 0.142281879195 | 0.0468043145225 |
| 6 | 0.137024608501 | 0.0541186463206 |
| 7 | 0.129434324065 | 0.0599776056468 |
| 8 | 0.124580536913 | 0.0657980852227 |
| 9 | 0.119686800895 | 0.0690609490538 |
| 10 | 0.118456375839 | 0.0815028809531 |
+--------+----------------+------------------+
[10 rows x 3 columns]
PROGRESS: Evaluate model M2
Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff | mean_precision | mean_recall |
+--------+-----------------+-----------------+
| 1 | 0.167785234899 | 0.0161053142356 |
| 2 | 0.144295302013 | 0.0254801793572 |
| 3 | 0.131991051454 | 0.0318025677732 |
| 4 | 0.125 | 0.0371829043559 |
| 5 | 0.11744966443 | 0.0410638816769 |
| 6 | 0.110738255034 | 0.0468292410286 |
| 7 | 0.105465004794 | 0.0524601958501 |
| 8 | 0.101510067114 | 0.0573956656301 |
| 9 | 0.0988068605518 | 0.0602808953692 |
| 10 | 0.098322147651 | 0.0649756343573 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]
PROGRESS: Evaluate model M3
Precision and recall summary statistics by cutoff
+--------+-----------------+------------------+
| cutoff | mean_precision | mean_recall |
+--------+-----------------+------------------+
| 1 | 0.11744966443 | 0.00704077148453 |
| 2 | 0.114093959732 | 0.0177310726317 |
| 3 | 0.106263982103 | 0.0223298124415 |
| 4 | 0.105704697987 | 0.0331716430118 |
| 5 | 0.109395973154 | 0.0435932795409 |
| 6 | 0.104586129754 | 0.047143384739 |
| 7 | 0.103547459252 | 0.0546491993938 |
| 8 | 0.100251677852 | 0.0607113639254 |
| 9 | 0.0995525727069 | 0.06877810981 |
| 10 | 0.0946308724832 | 0.0719324763374 |
+--------+-----------------+------------------+
[10 rows x 3 columns]
PROGRESS: Evaluate model M4
Precision and recall summary statistics by cutoff
+--------+-----------------+------------------+
| cutoff | mean_precision | mean_recall |
+--------+-----------------+------------------+
| 1 | 0.0369127516779 | 0.00380229838808 |
| 2 | 0.0436241610738 | 0.00896093206526 |
| 3 | 0.0425055928412 | 0.0113163832365 |
| 4 | 0.0394295302013 | 0.0123027684318 |
| 5 | 0.0395973154362 | 0.0136800043452 |
| 6 | 0.0402684563758 | 0.0167122518012 |
| 7 | 0.0407478427613 | 0.0196711571782 |
| 8 | 0.0402684563758 | 0.020772309913 |
| 9 | 0.037658463833 | 0.0211772817451 |
| 10 | 0.0355704697987 | 0.021486414479 |
+--------+-----------------+------------------+
[10 rows x 3 columns]
Model compare metric: precision_recall
In [20]:
gl.show_comparison(model_comparison, [m0, m1, m2, m3, m5])
In [ ]:
Content source: turi-code/tutorials
Similar notebooks: