Basic Recommender Functionalities

Basic Recommender Functionalities

The GraphLab Create recommender package implements a number of popular recommender models. The models differ in how they make new predictions and recommendations given observed data, but they conform to the same API. Namely, you can call create() to create the model, then call score(), recommend(), evaluate() on the returned model object.

Let's walk through each of these functions in turn.

Creating a model

You can create a simple recommender model using recommender.create().


In [1]:
import graphlab
# Show graphs and sframes inside ipython notebook
graphlab.canvas.set_target('ipynb')

In [2]:
# Load a small training set. The data is 2.8 MB.
training_data = graphlab.SFrame.read_csv("https://static.turi.com/datasets/movie_ratings/training_data.csv",
                                         column_type_hints={"rating":int})

model = graphlab.recommender.create(training_data, user_id="user", item_id="movie")


[INFO] 1446769948 : INFO:     (initialize_globals_from_environment:282): Setting configuration variable GRAPHLAB_FILEIO_ALTERNATIVE_SSL_CERT_FILE to /Users/roman/miniconda3/envs/rc17-conda/lib/python2.7/site-packages/certifi/cacert.pem
1446769948 : INFO:     (initialize_globals_from_environment:282): Setting configuration variable GRAPHLAB_FILEIO_ALTERNATIVE_SSL_CERT_DIR to 
This commercial license of GraphLab Create is assigned to engr@turi.com.

[INFO] Start server at: ipc:///tmp/graphlab_server-60307 - Server binary: /Users/roman/miniconda3/envs/rc17-conda/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1446769948.log
[INFO] GraphLab Server Version: 1.6.908
PROGRESS: Downloading https://static.turi.com/datasets/movie_ratings/training_data.csv to /var/tmp/graphlab-roman/60307/245388d8-96d7-43b2-a801-1b2de79af40f.csv
PROGRESS: Finished parsing file https://static.turi.com/datasets/movie_ratings/training_data.csv
PROGRESS: Parsing completed. Parsed 100 lines in 0.099973 secs.
PROGRESS: Finished parsing file https://static.turi.com/datasets/movie_ratings/training_data.csv
PROGRESS: Parsing completed. Parsed 82068 lines in 0.087074 secs.
PROGRESS: Recsys training: model = item_similarity
PROGRESS: Warning: Column 'rating' ignored.
PROGRESS:     To use this column as the target, set target = "rating" and use a method that allows the use of a target.
PROGRESS: Preparing data set.
PROGRESS:     Data has 82068 observations with 334 users and 7714 items.
PROGRESS:     Data prepared in: 0.143173s
PROGRESS: Computing item similarity statistics:
PROGRESS: Computing most similar items for 7714 items:
PROGRESS: +-----------------+-----------------+
PROGRESS: | Number of items | Elapsed Time    |
PROGRESS: +-----------------+-----------------+
PROGRESS: | 1000            | 1.63549         |
PROGRESS: | 2000            | 1.71756         |
PROGRESS: | 3000            | 1.79594         |
PROGRESS: | 4000            | 1.89782         |
PROGRESS: | 5000            | 1.99043         |
PROGRESS: | 6000            | 2.06005         |
PROGRESS: | 7000            | 2.13543         |
PROGRESS: +-----------------+-----------------+
PROGRESS: Finished training in 2.35439s

The above code automatically chose to create a ItemSimilarityRecommender based on the data provided. We may directly create this type of model with our own chosen configuration for the training process, such as a particular choice of similarity. It defaults to jaccard, but you can set it to cosine if the data contains rating information.


In [3]:
model_cosine = graphlab.item_similarity_recommender.create(training_data, user_id="user", item_id="movie", target="rating",
                                                           similarity_type="cosine")


PROGRESS: Recsys training: model = item_similarity
PROGRESS: Preparing data set.
PROGRESS:     Data has 82068 observations with 334 users and 7714 items.
PROGRESS:     Data prepared in: 0.177537s
PROGRESS: Computing item similarity statistics:
PROGRESS: Computing most similar items for 7714 items:
PROGRESS: +-----------------+-----------------+
PROGRESS: | Number of items | Elapsed Time    |
PROGRESS: +-----------------+-----------------+
PROGRESS: | 1000            | 0.535493        |
PROGRESS: | 2000            | 0.619526        |
PROGRESS: | 3000            | 0.692952        |
PROGRESS: | 4000            | 0.762314        |
PROGRESS: | 5000            | 0.829579        |
PROGRESS: | 6000            | 0.902835        |
PROGRESS: | 7000            | 0.984604        |
PROGRESS: +-----------------+-----------------+
PROGRESS: Finished training in 1.22591s
PROGRESS: Finished prediction in 0.129676s

Different models may have different configuration parameters and input argument semantics. For example, LinearRegressionModel and LogisticRegressionModel treats additional columns in the training_data as side features to use for prediction. See the API reference for details.

Making predictions

score() makes rating predictions for any user-item query pair in the input SFrame (or Pandas dataframe). For example, if the input contains the row "Alice, The Great Gatsby," score() will output a number that represents the model's prediction of how much "Alice" will like "The Great Gatsby."

The query data must have the same schema as the training data. In other words, it must contain columns of the same name for user and item id, and they must be in the same column position, i.e., if the training data contains the user IDs in the second column, then it must be in the second column in the query data as well. All other columns in the input query data are ignored. (Exceptions are the LinearRegressionModel and LogisticRegressionModel, which treat the additional columns as side features. See the API documentation for details.) In this example, even though the input data contains the groundtruth ratings, they are ignored by the model.


In [4]:
# Load some data for querying. The data is < 1MB.
# The query data must have the same columns ("user_id" and "item_id") as the training data.
# The column indices should also be the same, i.e., if the "user_id" column is the second
# column in the training data, then it also needs to be the second column in the query data.
query_data = graphlab.SFrame.read_csv("https://static.turi.com/datasets/movie_ratings/query_data.csv", column_type_hints={"rating":int})
query_data.show()


PROGRESS: Downloading https://static.turi.com/datasets/movie_ratings/query_data.csv to /var/tmp/graphlab-roman/60307/cf23702d-7230-4f20-9102-df3da34de76a.csv
PROGRESS: Finished parsing file https://static.turi.com/datasets/movie_ratings/query_data.csv
PROGRESS: Parsing completed. Parsed 100 lines in 0.042761 secs.
PROGRESS: Finished parsing file https://static.turi.com/datasets/movie_ratings/query_data.csv
PROGRESS: Parsing completed. Parsed 20536 lines in 0.032935 secs.

The output of score() is an SArray with as many rows as the query data. The i-th entry in this SArray is the rating prediction for the i-th user-item pair in the query data.

The prediction scores are unnormalized, meaning that they may not conform to the scale of the original ratings given in the training data. (Remember that this model was not trained to predict target ratings, but simply to rank items and make recommendations. It can still make ratings predictions, but the scores you see may not map to the original scale.)


In [5]:
# Now make some predictions 
query_result = model.predict(query_data)
query_result.head()


PROGRESS: Finished prediction in 0.039695s
Out[5]:
dtype: float
Rows: 10
[0.07340476159384636, 0.04886301894980847, 0.02600915830746077, 0.020213259677976495, 0.04823297180826005, 0.06089853129040893, 0.06895750834180256, 0.0, 0.057803679975613115, 0.0642409268063985]

With a sprinkle of SArray statistical magic dust, you can scale the model's output prediction scores to the scale of the original scores.


In [6]:
# Scale the results to be on the same scale as the original ratings
scaled_result = (query_result - query_result.min())/(query_result.max() - query_result.min()) * query_data['rating'].max()
scaled_result.head()


Out[6]:
dtype: float
Rows: 10
[1.3289084175811396, 0.8846085155907757, 0.4708657675401071, 0.36593771779334067, 0.8732022398710979, 1.1024975640944963, 1.248396034549961, 0.0, 1.0464688559549553, 1.1630077740541775]

Making recommendations

Unlike score(), which returns raw predicted scores of user-item pairs, recommend() returns a ranked list of items. The input parameter users should be an SArray of user ID for which to make recommendations. If users is set to None, then recommend() makes recommendations for all users seen during training, while automatically excluding the items that the user has already rated in the training data. In other words, if the training data contains a row "Alice, The Great Gatsby", then recommend() will not recommend "The Great Gatsby" for user "Alice". It will return at most k new items for each user, sorted by their rank. It will return fewer than k items if there are not enough items that the user has not already rated or seen.

The output of recommend is an SFrame of four columns: "user", "item", "score", and "rank".


In [7]:
recommend_result = model.recommend(users=None, k=5)
recommend_result.head()


Out[7]:
user movie score rank
Jacob Smith Indiana Jones and the
Last Crusade ...
0.14814774001 1
Jacob Smith Good Will Hunting 0.1456993707 2
Jacob Smith Top Gun 0.145652426782 3
Jacob Smith Jerry Maguire 0.133805845791 4
Jacob Smith Braveheart 0.131875253736 5
Mason Smith Jerry Maguire 0.0560217360383 1
Mason Smith Stand by Me 0.0536424431616 2
Mason Smith Good Will Hunting 0.0528783395787 3
Mason Smith The Shawshank Redemption:
Special Edition ...
0.0478933808669 4
Mason Smith The Silence of the Lambs 0.0458903165431 5
[10 rows x 4 columns]

The raw scores for ItemSimilarityModel with Jaccard similarity is rather meaningless: a higher score could mean that the user rated more items, or the item in question is more popular, or it is very similar to many other items. So ignore the score column in this instance. The rank is what counts.

Training and validation

If you've spent any time near machine learning nerds, you may have heard strange phrases like "hold-out validation" and "generalization error." What on earth do they mean? Basically, it has to do with measuring the accuracy of the model. Measuring accuracy on training data is cheating, because the model already knows about that dataset and can easily output good predictions to fool you into believing that it's doing well. Hence, it's more fair to evaluate the model on data that has not been seen by the model during training. This is what the validation dataset is for. It is common practice to divide the whole dataset into two parts, with one part being "held out" from the training process and used only for validation.

GraphLab Create allows you to do this with ease. recommender.util.random_split_by_user() allows you to hold out a random subset of items for a number of users to be used as validation data. Note that this is not the same as tossing a coin for each observation to decide whether it should be used as training or validation--a method that does not guarantee that some items are retained for each user for training.


In [8]:
training_subset, validation_subset = graphlab.recommender.util.random_split_by_user(training_data,
                                                                                    user_id="user", item_id="movie",
                                                                                    max_num_users=100, item_test_proportion=0.3)

In [9]:
model = graphlab.recommender.create(training_subset, user_id="user", item_id="movie", target="rating")


PROGRESS: Recsys training: model = ranking_factorization_recommender
PROGRESS: Preparing data set.
PROGRESS:     Data has 74646 observations with 334 users and 7518 items.
PROGRESS:     Data prepared in: 0.157435s
PROGRESS: Training ranking_factorization_recommender for recommendations.
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | Parameter                      | Description                                      | Value    |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | num_factors                    | Factor Dimension                                 | 32       |
PROGRESS: | regularization                 | L2 Regularization on Factors                     | 1e-09    |
PROGRESS: | solver                         | Solver used for training                         | sgd      |
PROGRESS: | linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |
PROGRESS: | ranking_regularization         | Rank-based Regularization Weight                 | 0.25     |
PROGRESS: | max_iterations                 | Maximum Number of Iterations                     | 25       |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS:   Optimizing model using SGD; tuning step size.
PROGRESS:   Using 10000 / 74646 points for tuning the step size.
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Attempt | Initial Step Size | Estimated Objective Value                |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | 0       | 25                | Not Viable                               |
PROGRESS: | 1       | 6.25              | Not Viable                               |
PROGRESS: | 2       | 1.5625            | Not Viable                               |
PROGRESS: | 3       | 0.390625          | Not Viable                               |
PROGRESS: | 4       | 0.0976562         | 1.59857                                  |
PROGRESS: | 5       | 0.0488281         | 1.64645                                  |
PROGRESS: | 6       | 0.0244141         | 1.71585                                  |
PROGRESS: | 7       | 0.012207          | 1.79496                                  |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Final   | 0.0976562         | 1.59857                                  |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: Starting Optimization.
PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+
PROGRESS: | Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |
PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+
PROGRESS: | Initial | 155us        | 2.40943           | 1.10855               |             |
PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+
PROGRESS: | 1       | 134.829ms    | 2.01943           | 1.1283                | 0.0976562   |
PROGRESS: | 2       | 253.868ms    | 1.79951           | 1.0656                | 0.0580668   |
PROGRESS: | 3       | 350.181ms    | 1.59783           | 0.990766              | 0.042841    |
PROGRESS: | 4       | 515.615ms    | 1.42803           | 0.922456              | 0.0345267   |
PROGRESS: | 5       | 617.001ms    | 1.25634           | 0.849602              | 0.029206    |
PROGRESS: | 6       | 710.735ms    | 1.1073            | 0.783118              | 0.0254734   |
PROGRESS: | 10      | 1.06s        | 0.736345          | 0.601833              | 0.017366    |
PROGRESS: | 11      | 1.15s        | 0.691719          | 0.579102              | 0.016168    |
PROGRESS: | 20      | 1.93s        | 0.516482          | 0.481056              | 0.0103259   |
PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+
PROGRESS: Optimization Complete: Maximum number of passes through the data reached.
PROGRESS: Computing final objective value and training RMSE.
PROGRESS:        Final objective value: 0.442736
PROGRESS:        Final training RMSE: 0.399063

Evaluation

To evaluate the accuracy or ranking performance of a recommender model, call evaluate() with a test or validation dataset. The dataset should have the same schema (column headers and positions) as what was used during training. If the model was trained to predict a target column of ratings, evaluate() will call score() to predict ratings for all the user-item pairs given in the testset, and output the resulting RMSE (Root-Mean-Squared Error) scores. If the model was trained without a target column, evaluate() will rank the items for each user in the given test set, and evaluate the precision and recall using the given items. In either case, the model outputs are evaluated against the groundtruth data provided in the input dataset.

The precision and recall scores are computed for different cutoff values (i.e., how many recommendations to take when computing the scores). See the API documentation for more details.


In [10]:
# Manually evaluate on the validation data. Evaluation results contain per-user, per-item and overall RMSEs.
rmse_results = model.evaluate(validation_subset)


Precision and recall summary statistics by cutoff
+--------+----------------+------------------+
| cutoff | mean_precision |   mean_recall    |
+--------+----------------+------------------+
|   1    |      0.15      | 0.00364661621451 |
|   2    |     0.195      | 0.00762640672686 |
|   3    | 0.173333333333 | 0.00942030232094 |
|   4    |      0.16      | 0.0114221929688  |
|   5    |     0.166      |  0.014760364662  |
|   6    | 0.166666666667 | 0.0181327891166  |
|   7    |      0.17      | 0.0200845271886  |
|   8    |      0.17      | 0.0233430134064  |
|   9    | 0.178888888889 | 0.0279373977536  |
|   10   |     0.176      | 0.0300501718962  |
+--------+----------------+------------------+
[10 rows x 3 columns]


Overall RMSE:  1.78728882846

Per User RMSE (best)
+-------------+-------+----------------+
|     user    | count |      rmse      |
+-------------+-------+----------------+
| Jaxon Smith |   90  | 0.976649400476 |
+-------------+-------+----------------+
[1 rows x 3 columns]


Per User RMSE (worst)
+------------+-------+---------------+
|    user    | count |      rmse     |
+------------+-------+---------------+
| Ivan Smith |  319  | 3.50770954415 |
+------------+-------+---------------+
[1 rows x 3 columns]


Per Item RMSE (best)
+----------------------+-------+------------------+
|        movie         | count |       rmse       |
+----------------------+-------+------------------+
| Deconstructing Harry |   1   | 0.00171446092523 |
+----------------------+-------+------------------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+--------------------+-------+---------------+
|       movie        | count |      rmse     |
+--------------------+-------+---------------+
| Courage Under Fire |   1   | 7.12411820181 |
+--------------------+-------+---------------+
[1 rows x 3 columns]


In [11]:
rmse_results['rmse_by_item'].show()



In [12]:
rmse_results['rmse_by_user'].show()


RMSE evaluation results show the user and the item with the best and worst average RMSE, as well as the overall average RMSE across all user-item pairs.


In [13]:
# Now let's try a model that performs ranking by default, and look at the precision-recall validation results.
model2 = graphlab.ranking_factorization_recommender.create(training_subset, user_id="user", item_id="movie")
precision_recall_results = model2.evaluate(validation_subset)


PROGRESS: Recsys training: model = ranking_factorization_recommender
PROGRESS: Preparing data set.
PROGRESS:     Data has 74646 observations with 334 users and 7518 items.
PROGRESS:     Data prepared in: 0.160148s
PROGRESS: Training ranking_factorization_recommender for recommendations.
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | Parameter                      | Description                                      | Value    |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | num_factors                    | Factor Dimension                                 | 32       |
PROGRESS: | regularization                 | L2 Regularization on Factors                     | 1e-09    |
PROGRESS: | solver                         | Solver used for training                         | adagrad  |
PROGRESS: | linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |
PROGRESS: | binary_target                  | Assume Binary Targets                            | True     |
PROGRESS: | max_iterations                 | Maximum Number of Iterations                     | 25       |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS:   Optimizing model using SGD; tuning step size.
PROGRESS:   Using 10000 / 74646 points for tuning the step size.
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Attempt | Initial Step Size | Estimated Objective Value                |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | 0       | 16.6667           | Not Viable                               |
PROGRESS: | 1       | 4.16667           | Not Viable                               |
PROGRESS: | 2       | 1.04167           | Not Viable                               |
PROGRESS: | 3       | 0.260417          | 0.789701                                 |
PROGRESS: | 4       | 0.130208          | 1.03161                                  |
PROGRESS: | 5       | 0.0651042         | No Decrease (1.40114 >= 1.38644)         |
PROGRESS: | 6       | 0.016276          | 1.34503                                  |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Final   | 0.260417          | 0.789701                                 |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: Starting Optimization.
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Iter.   | Elapsed Time | Approx. Objective | Approx. Training Predictive Error | Step Size   |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Initial | 184us        | 1.38644           | 0.693147                          |             |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | 1       | 319.643ms    | DIVERGED          | DIVERGED                          | 0.260417    |
PROGRESS: | RESET   | 415.824ms    | 1.38644           | 0.69314                           |             |
PROGRESS: | 1       | 651.168ms    | 1.68726           | 0.792579                          | 0.130208    |
PROGRESS: | 2       | 862.295ms    | 1.41041           | 0.71768                           | 0.130208    |
PROGRESS: | 3       | 1.05s        | 1.2851            | 0.644217                          | 0.130208    |
PROGRESS: | 4       | 1.25s        | 1.22678           | 0.615219                          | 0.130208    |
PROGRESS: | 5       | 1.42s        | 1.19133           | 0.594955                          | 0.130208    |
PROGRESS: | 6       | 1.59s        | 1.14787           | 0.569126                          | 0.130208    |
PROGRESS: | 9       | 2.09s        | 1.07141           | 0.528061                          | 0.130208    |
PROGRESS: | 11      | 2.44s        | 1.03632           | 0.508879                          | 0.130208    |
PROGRESS: | 19      | 3.86s        | 0.930908          | 0.449121                          | 0.130208    |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: Optimization Complete: Maximum number of passes through the data reached.
PROGRESS: Computing final objective value and training Predictive Error.
PROGRESS:        Final objective value: 0.867562
PROGRESS:        Final training Predictive Error: 0.392381
[WARNING] Model trained without a target. Skipping RMSE computation.
Precision and recall summary statistics by cutoff
+--------+----------------+------------------+
| cutoff | mean_precision |   mean_recall    |
+--------+----------------+------------------+
|   1    |      0.41      | 0.00786199199559 |
|   2    |      0.36      | 0.0119830770402  |
|   3    | 0.323333333333 | 0.0164133617271  |
|   4    |     0.315      | 0.0208737791232  |
|   5    |     0.292      | 0.0240863152257  |
|   6    |     0.285      | 0.0283732012891  |
|   7    | 0.284285714286 | 0.0338857364306  |
|   8    |     0.2775     | 0.0376710258721  |
|   9    |      0.28      | 0.0439081884431  |
|   10   |     0.272      | 0.0503191459875  |
+--------+----------------+------------------+
[10 rows x 3 columns]

(Looking for more details about the modules and functions? Check out the API docs.)