Week 8 - Advanced Machine Learning

Recommendation systems

Recommendation systems attempt to predict the preference of a user for an item. The goal typically being to then present the user with the items they are most likely to prefer.

Uses

Recommendation systems have been increasingly used in recent years in online retailing. Perhaps the most well-known example being movie recommendations due to the Netflix prize. Films, books, music, food and other purchasable items are all commonly the subject of recommendations systems. The same concepts have also been applied to research articles, collaborators, romantic partners, news items, travel routes, and many other areas.

Types

There are two main types of recommendation systems:

Collaborative filtering utilizes the preferences of many users, basing recommendations on what other users with similar preferences to you have liked previously. Collaborative filtering systems do not need to know anything about the users or items, eliminating the need to develop features that accurately capture differences between users or items.

Content-based filtering utilizes item descriptions and user profiles or past/current preferences to identify similar items to recommend.

Both approaches have limitations and an increasingly popular approach is hybrid recommendations systems. These systems attempt to combine both collaborative and content-based filtering. Content-based filtering can aid collaborative filtering by providing initial recommendations when there is insufficient data on the users preferences, and provide a coarse starting point for the sparse set of labels on user preferences.

Another approach used to supplement user preferences is to use multiple different types of user feedback. In addition to explicit ratings for items more implicit feedback can be used such as viewing duration.

Evaluation

Performance can be evaluated with metrics we are already familar with such as root mean square error. However, accuracy is not the only factor in determining user satisfaction or the utility of the system. For example, a recommendation system can return several almost identical items and get a very low RMSE but a user will likely prefer a more diverse set of recommendations.

The Surprise scikit package implements several strategies for creating recommendation systems.


In [11]:
from surprise import SVD
from surprise import Dataset
from surprise import evaluate, print_perf


# Load the movielens-100k dataset (download it if needed),
# and split it into 3 folds for cross-validation.
data = Dataset.load_builtin('ml-100k')

In [12]:
data.split(n_folds=3)

# We'll use the famous SVD algorithm.
algo = SVD()

# Evaluate performances of our algorithm on the dataset.
perf = evaluate(algo, data, measures=['RMSE', 'MAE'])

print_perf(perf)


Evaluating RMSE, MAE of algorithm SVD.

------------
Fold 1
RMSE: 0.9415
MAE:  0.7425
------------
Fold 2
RMSE: 0.9379
MAE:  0.7396
------------
Fold 3
RMSE: 0.9515
MAE:  0.7501
------------
------------
Mean RMSE: 0.9437
Mean MAE : 0.7440
------------
------------
        Fold 1  Fold 2  Fold 3  Mean    
RMSE    0.9415  0.9379  0.9515  0.9437  
MAE     0.7425  0.7396  0.7501  0.7440  

In [16]:
import pandas as pd
from surprise import GridSearch

param_grid = {'n_epochs': [20, 50], 'lr_all': [0.002, 0.005, 0.01],
              'reg_all': [0.01, 0.02, 0.04]}

grid_search = GridSearch(SVD, param_grid, measures=['RMSE', 'FCP'])

grid_search.evaluate(data)

results_df = pd.DataFrame.from_dict(grid_search.cv_results)


------------
Parameters combination 1 of 18
params:  {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0.002}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9571
Mean FCP : 0.6903
------------
------------
Parameters combination 2 of 18
params:  {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0.005}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9439
Mean FCP : 0.7020
------------
------------
Parameters combination 3 of 18
params:  {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0.01}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9388
Mean FCP : 0.7077
------------
------------
Parameters combination 4 of 18
params:  {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0.002}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9434
Mean FCP : 0.7019
------------
------------
Parameters combination 5 of 18
params:  {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0.005}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9343
Mean FCP : 0.7111
------------
------------
Parameters combination 6 of 18
params:  {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0.01}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9298
Mean FCP : 0.7169
------------
------------
Parameters combination 7 of 18
params:  {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0.002}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9572
Mean FCP : 0.6910
------------
------------
Parameters combination 8 of 18
params:  {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0.005}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9437
Mean FCP : 0.7035
------------
------------
Parameters combination 9 of 18
params:  {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0.01}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9386
Mean FCP : 0.7097
------------
------------
Parameters combination 10 of 18
params:  {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0.002}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9432
Mean FCP : 0.7036
------------
------------
Parameters combination 11 of 18
params:  {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0.005}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9344
Mean FCP : 0.7127
------------
------------
Parameters combination 12 of 18
params:  {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0.01}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9297
Mean FCP : 0.7174
------------
------------
Parameters combination 13 of 18
params:  {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0.002}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9571
Mean FCP : 0.6925
------------
------------
Parameters combination 14 of 18
params:  {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0.005}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9434
Mean FCP : 0.7053
------------
------------
Parameters combination 15 of 18
params:  {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0.01}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9387
Mean FCP : 0.7110
------------
------------
Parameters combination 16 of 18
params:  {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0.002}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9430
Mean FCP : 0.7055
------------
------------
Parameters combination 17 of 18
params:  {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0.005}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9353
Mean FCP : 0.7135
------------
------------
Parameters combination 18 of 18
params:  {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0.01}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9304
Mean FCP : 0.7176
------------

In [17]:
results_df


Out[17]:
FCP RMSE n_epochs params reg_all scores
0 0.690270 0.957140 20 {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0.... 0.01 {'FCP': 0.690270073666, 'RMSE': 0.957140337404}
1 0.702041 0.943891 20 {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0.... 0.01 {'FCP': 0.702040728196, 'RMSE': 0.943890978109}
2 0.707743 0.938782 20 {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0.01} 0.01 {'FCP': 0.707742910659, 'RMSE': 0.938782007822}
3 0.701908 0.943362 50 {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0.... 0.01 {'FCP': 0.701908218014, 'RMSE': 0.943362448617}
4 0.711061 0.934342 50 {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0.... 0.01 {'FCP': 0.71106091186, 'RMSE': 0.934341616096}
5 0.716940 0.929780 50 {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0.01} 0.01 {'FCP': 0.716939583192, 'RMSE': 0.92978029751}
6 0.690995 0.957163 20 {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0.... 0.02 {'FCP': 0.690995406996, 'RMSE': 0.957162793551}
7 0.703524 0.943666 20 {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0.... 0.02 {'FCP': 0.703523847093, 'RMSE': 0.943666067785}
8 0.709707 0.938564 20 {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0.01} 0.02 {'FCP': 0.709707062665, 'RMSE': 0.938563678624}
9 0.703552 0.943198 50 {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0.... 0.02 {'FCP': 0.703551596503, 'RMSE': 0.943198060146}
10 0.712722 0.934433 50 {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0.... 0.02 {'FCP': 0.712721584365, 'RMSE': 0.934433433864}
11 0.717399 0.929682 50 {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0.01} 0.02 {'FCP': 0.717399348655, 'RMSE': 0.929681857717}
12 0.692501 0.957118 20 {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0.... 0.04 {'FCP': 0.69250094831, 'RMSE': 0.957118259604}
13 0.705315 0.943393 20 {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0.... 0.04 {'FCP': 0.70531536017, 'RMSE': 0.943393369507}
14 0.710967 0.938701 20 {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0.01} 0.04 {'FCP': 0.710967427128, 'RMSE': 0.938701221102}
15 0.705454 0.943009 50 {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0.... 0.04 {'FCP': 0.705453791553, 'RMSE': 0.943009295146}
16 0.713532 0.935277 50 {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0.... 0.04 {'FCP': 0.713531711101, 'RMSE': 0.93527730346}
17 0.717564 0.930404 50 {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0.01} 0.04 {'FCP': 0.71756360075, 'RMSE': 0.930403508993}

In [18]:
results_df[['FCP', 'RMSE', 'n_epochs', 'reg_all', 'params']].sort_values('RMSE')


Out[18]:
FCP RMSE n_epochs reg_all params
11 0.717399 0.929682 50 0.02 {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0.01}
5 0.716940 0.929780 50 0.01 {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0.01}
17 0.717564 0.930404 50 0.04 {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0.01}
4 0.711061 0.934342 50 0.01 {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0....
10 0.712722 0.934433 50 0.02 {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0....
16 0.713532 0.935277 50 0.04 {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0....
8 0.709707 0.938564 20 0.02 {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0.01}
14 0.710967 0.938701 20 0.04 {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0.01}
2 0.707743 0.938782 20 0.01 {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0.01}
15 0.705454 0.943009 50 0.04 {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0....
9 0.703552 0.943198 50 0.02 {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0....
3 0.701908 0.943362 50 0.01 {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0....
13 0.705315 0.943393 20 0.04 {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0....
7 0.703524 0.943666 20 0.02 {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0....
1 0.702041 0.943891 20 0.01 {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0....
12 0.692501 0.957118 20 0.04 {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0....
0 0.690270 0.957140 20 0.01 {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0....
6 0.690995 0.957163 20 0.02 {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0....

In [ ]: