Week 8 - Advanced Machine Learning

Recommendation systems

Recommendation systems attempt to predict the preference of a user for an item. The goal typically being to then present the user with the items they are most likely to prefer.

Uses

Recommendation systems have been increasingly used in recent years in online retailing. Perhaps the most well-known example being movie recommendations due to the Netflix prize. Films, books, music, food and other purchasable items are all commonly the subject of recommendations systems. The same concepts have also been applied to research articles, collaborators, romantic partners, news items, travel routes, and many other areas.

Types

There are two main types of recommendation systems:

Collaborative filtering utilizes the preferences of many users, basing recommendations on what other users with similar preferences to you have liked previously. Collaborative filtering systems do not need to know anything about the users or items, eliminating the need to develop features that accurately capture differences between users or items.

Content-based filtering utilizes item descriptions and user profiles or past/current preferences to identify similar items to recommend.

Both approaches have limitations and an increasingly popular approach is hybrid recommendations systems. These systems attempt to combine both collaborative and content-based filtering. Content-based filtering can aid collaborative filtering by providing initial recommendations when there is insufficient data on the users preferences, and provide a coarse starting point for the sparse set of labels on user preferences.

Another approach used to supplement user preferences is to use multiple different types of user feedback. In addition to explicit ratings for items more implicit feedback can be used such as viewing duration.

Evaluation

Performance can be evaluated with metrics we are already familar with such as root mean square error. However, accuracy is not the only factor in determining user satisfaction or the utility of the system. For example, a recommendation system can return several almost identical items and get a very low RMSE but a user will likely prefer a more diverse set of recommendations.

The Surprise scikit package implements several strategies for creating recommendation systems.



In [11]:

    
from surprise import SVD
from surprise import Dataset
from surprise import evaluate, print_perf


# Load the movielens-100k dataset (download it if needed),
# and split it into 3 folds for cross-validation.
data = Dataset.load_builtin('ml-100k')



In [12]:

    
data.split(n_folds=3)

# We'll use the famous SVD algorithm.
algo = SVD()

# Evaluate performances of our algorithm on the dataset.
perf = evaluate(algo, data, measures=['RMSE', 'MAE'])

print_perf(perf)









    



Evaluating RMSE, MAE of algorithm SVD.

------------
Fold 1
RMSE: 0.9415
MAE:  0.7425
------------
Fold 2
RMSE: 0.9379
MAE:  0.7396
------------
Fold 3
RMSE: 0.9515
MAE:  0.7501
------------
------------
Mean RMSE: 0.9437
Mean MAE : 0.7440
------------
------------
        Fold 1  Fold 2  Fold 3  Mean    
RMSE    0.9415  0.9379  0.9515  0.9437  
MAE     0.7425  0.7396  0.7501  0.7440



In [16]:

    
import pandas as pd
from surprise import GridSearch

param_grid = {'n_epochs': [20, 50], 'lr_all': [0.002, 0.005, 0.01],
              'reg_all': [0.01, 0.02, 0.04]}

grid_search = GridSearch(SVD, param_grid, measures=['RMSE', 'FCP'])

grid_search.evaluate(data)

results_df = pd.DataFrame.from_dict(grid_search.cv_results)









    



------------
Parameters combination 1 of 18
params:  {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0.002}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9571
Mean FCP : 0.6903
------------
------------
Parameters combination 2 of 18
params:  {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0.005}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9439
Mean FCP : 0.7020
------------
------------
Parameters combination 3 of 18
params:  {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0.01}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9388
Mean FCP : 0.7077
------------
------------
Parameters combination 4 of 18
params:  {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0.002}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9434
Mean FCP : 0.7019
------------
------------
Parameters combination 5 of 18
params:  {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0.005}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9343
Mean FCP : 0.7111
------------
------------
Parameters combination 6 of 18
params:  {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0.01}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9298
Mean FCP : 0.7169
------------
------------
Parameters combination 7 of 18
params:  {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0.002}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9572
Mean FCP : 0.6910
------------
------------
Parameters combination 8 of 18
params:  {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0.005}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9437
Mean FCP : 0.7035
------------
------------
Parameters combination 9 of 18
params:  {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0.01}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9386
Mean FCP : 0.7097
------------
------------
Parameters combination 10 of 18
params:  {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0.002}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9432
Mean FCP : 0.7036
------------
------------
Parameters combination 11 of 18
params:  {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0.005}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9344
Mean FCP : 0.7127
------------
------------
Parameters combination 12 of 18
params:  {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0.01}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9297
Mean FCP : 0.7174
------------
------------
Parameters combination 13 of 18
params:  {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0.002}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9571
Mean FCP : 0.6925
------------
------------
Parameters combination 14 of 18
params:  {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0.005}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9434
Mean FCP : 0.7053
------------
------------
Parameters combination 15 of 18
params:  {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0.01}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9387
Mean FCP : 0.7110
------------
------------
Parameters combination 16 of 18
params:  {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0.002}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9430
Mean FCP : 0.7055
------------
------------
Parameters combination 17 of 18
params:  {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0.005}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9353
Mean FCP : 0.7135
------------
------------
Parameters combination 18 of 18
params:  {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0.01}
Evaluating RMSE, FCP of algorithm SVD.

------------
Mean RMSE: 0.9304
Mean FCP : 0.7176
------------



In [17]:

    
results_df









    Out[17]:






  
    
      
      FCP
      RMSE
      n_epochs
      params
      reg_all
      scores
    
  
  
    
      0
      0.690270
      0.957140
      20
      {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0....
      0.01
      {'FCP': 0.690270073666, 'RMSE': 0.957140337404}
    
    
      1
      0.702041
      0.943891
      20
      {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0....
      0.01
      {'FCP': 0.702040728196, 'RMSE': 0.943890978109}
    
    
      2
      0.707743
      0.938782
      20
      {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0.01}
      0.01
      {'FCP': 0.707742910659, 'RMSE': 0.938782007822}
    
    
      3
      0.701908
      0.943362
      50
      {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0....
      0.01
      {'FCP': 0.701908218014, 'RMSE': 0.943362448617}
    
    
      4
      0.711061
      0.934342
      50
      {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0....
      0.01
      {'FCP': 0.71106091186, 'RMSE': 0.934341616096}
    
    
      5
      0.716940
      0.929780
      50
      {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0.01}
      0.01
      {'FCP': 0.716939583192, 'RMSE': 0.92978029751}
    
    
      6
      0.690995
      0.957163
      20
      {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0....
      0.02
      {'FCP': 0.690995406996, 'RMSE': 0.957162793551}
    
    
      7
      0.703524
      0.943666
      20
      {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0....
      0.02
      {'FCP': 0.703523847093, 'RMSE': 0.943666067785}
    
    
      8
      0.709707
      0.938564
      20
      {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0.01}
      0.02
      {'FCP': 0.709707062665, 'RMSE': 0.938563678624}
    
    
      9
      0.703552
      0.943198
      50
      {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0....
      0.02
      {'FCP': 0.703551596503, 'RMSE': 0.943198060146}
    
    
      10
      0.712722
      0.934433
      50
      {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0....
      0.02
      {'FCP': 0.712721584365, 'RMSE': 0.934433433864}
    
    
      11
      0.717399
      0.929682
      50
      {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0.01}
      0.02
      {'FCP': 0.717399348655, 'RMSE': 0.929681857717}
    
    
      12
      0.692501
      0.957118
      20
      {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0....
      0.04
      {'FCP': 0.69250094831, 'RMSE': 0.957118259604}
    
    
      13
      0.705315
      0.943393
      20
      {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0....
      0.04
      {'FCP': 0.70531536017, 'RMSE': 0.943393369507}
    
    
      14
      0.710967
      0.938701
      20
      {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0.01}
      0.04
      {'FCP': 0.710967427128, 'RMSE': 0.938701221102}
    
    
      15
      0.705454
      0.943009
      50
      {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0....
      0.04
      {'FCP': 0.705453791553, 'RMSE': 0.943009295146}
    
    
      16
      0.713532
      0.935277
      50
      {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0....
      0.04
      {'FCP': 0.713531711101, 'RMSE': 0.93527730346}
    
    
      17
      0.717564
      0.930404
      50
      {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0.01}
      0.04
      {'FCP': 0.71756360075, 'RMSE': 0.930403508993}



In [18]:

    
results_df[['FCP', 'RMSE', 'n_epochs', 'reg_all', 'params']].sort_values('RMSE')









    Out[18]:






  
    
      
      FCP
      RMSE
      n_epochs
      reg_all
      params
    
  
  
    
      11
      0.717399
      0.929682
      50
      0.02
      {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0.01}
    
    
      5
      0.716940
      0.929780
      50
      0.01
      {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0.01}
    
    
      17
      0.717564
      0.930404
      50
      0.04
      {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0.01}
    
    
      4
      0.711061
      0.934342
      50
      0.01
      {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0....
    
    
      10
      0.712722
      0.934433
      50
      0.02
      {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0....
    
    
      16
      0.713532
      0.935277
      50
      0.04
      {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0....
    
    
      8
      0.709707
      0.938564
      20
      0.02
      {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0.01}
    
    
      14
      0.710967
      0.938701
      20
      0.04
      {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0.01}
    
    
      2
      0.707743
      0.938782
      20
      0.01
      {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0.01}
    
    
      15
      0.705454
      0.943009
      50
      0.04
      {'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0....
    
    
      9
      0.703552
      0.943198
      50
      0.02
      {'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0....
    
    
      3
      0.701908
      0.943362
      50
      0.01
      {'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0....
    
    
      13
      0.705315
      0.943393
      20
      0.04
      {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0....
    
    
      7
      0.703524
      0.943666
      20
      0.02
      {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0....
    
    
      1
      0.702041
      0.943891
      20
      0.01
      {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0....
    
    
      12
      0.692501
      0.957118
      20
      0.04
      {'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0....
    
    
      0
      0.690270
      0.957140
      20
      0.01
      {'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0....
    
    
      6
      0.690995
      0.957163
      20
      0.02
      {'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0....



In [ ]:

	FCP	RMSE	n_epochs	params	reg_all	scores
0	0.690270	0.957140	20	{'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0....	0.01	{'FCP': 0.690270073666, 'RMSE': 0.957140337404}
1	0.702041	0.943891	20	{'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0....	0.01	{'FCP': 0.702040728196, 'RMSE': 0.943890978109}
2	0.707743	0.938782	20	{'reg_all': 0.01, 'n_epochs': 20, 'lr_all': 0.01}	0.01	{'FCP': 0.707742910659, 'RMSE': 0.938782007822}
3	0.701908	0.943362	50	{'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0....	0.01	{'FCP': 0.701908218014, 'RMSE': 0.943362448617}
4	0.711061	0.934342	50	{'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0....	0.01	{'FCP': 0.71106091186, 'RMSE': 0.934341616096}
5	0.716940	0.929780	50	{'reg_all': 0.01, 'n_epochs': 50, 'lr_all': 0.01}	0.01	{'FCP': 0.716939583192, 'RMSE': 0.92978029751}
6	0.690995	0.957163	20	{'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0....	0.02	{'FCP': 0.690995406996, 'RMSE': 0.957162793551}
7	0.703524	0.943666	20	{'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0....	0.02	{'FCP': 0.703523847093, 'RMSE': 0.943666067785}
8	0.709707	0.938564	20	{'reg_all': 0.02, 'n_epochs': 20, 'lr_all': 0.01}	0.02	{'FCP': 0.709707062665, 'RMSE': 0.938563678624}
9	0.703552	0.943198	50	{'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0....	0.02	{'FCP': 0.703551596503, 'RMSE': 0.943198060146}
10	0.712722	0.934433	50	{'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0....	0.02	{'FCP': 0.712721584365, 'RMSE': 0.934433433864}
11	0.717399	0.929682	50	{'reg_all': 0.02, 'n_epochs': 50, 'lr_all': 0.01}	0.02	{'FCP': 0.717399348655, 'RMSE': 0.929681857717}
12	0.692501	0.957118	20	{'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0....	0.04	{'FCP': 0.69250094831, 'RMSE': 0.957118259604}
13	0.705315	0.943393	20	{'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0....	0.04	{'FCP': 0.70531536017, 'RMSE': 0.943393369507}
14	0.710967	0.938701	20	{'reg_all': 0.04, 'n_epochs': 20, 'lr_all': 0.01}	0.04	{'FCP': 0.710967427128, 'RMSE': 0.938701221102}
15	0.705454	0.943009	50	{'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0....	0.04	{'FCP': 0.705453791553, 'RMSE': 0.943009295146}
16	0.713532	0.935277	50	{'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0....	0.04	{'FCP': 0.713531711101, 'RMSE': 0.93527730346}
17	0.717564	0.930404	50	{'reg_all': 0.04, 'n_epochs': 50, 'lr_all': 0.01}	0.04	{'FCP': 0.71756360075, 'RMSE': 0.930403508993}