In [1]:

    
%pylab inline
import pandas as pd

import numpy as np
from __future__ import division
import itertools

import matplotlib.pyplot as plt
import seaborn as sns

import logging
logger = logging.getLogger()









    



Populating the interactive namespace from numpy and matplotlib

9 Recommendation Systems

two broad groups:

Content-based systems
focus on the properities of items.
Collaborative filtering systems
focus on the relationship between users and items.

9.1 A Model for Recommendation Systems

The Utility Matrix

record the preference given by users for certain items.



In [66]:

    
# Example 9.1
M = pd.DataFrame(index=['A', 'B', 'C', 'D'], columns=['HP1', 'HP2', 'HP3', 'TW', 'SW1', 'SW2', 'SW3'])

M.loc['A', ['HP1', 'TW', 'SW1']] = [4, 5, 1]
M.iloc[1, 0:3] = [5, 5, 4]
M.iloc[2, 3:-1] = [2, 4, 5]
M.iloc[3, [1, -1]] = [3, 3]

M_9_1 = M
M_9_1

In practice, the matrix would be even sparser, with the typical user rating only a tiny fraction of all avalibale items.

the goal of a recommendation system is: to predict the blanks in the utility matrix.

slightly difference in many application:
- predict every blank entry $<$ discover some potential entries in each row.
- find all items with the highest expected ratings $<$ find a large subset of those.

The Long Tail

physical institutions	online institutions
provide only the most popular items	provide the entire range of items

the long tail force online institutions to recommend items to individual users:

It's no possible to present all avaliable items to the user.
Neither can we expect users to have heared of each of the items they might like.

Applications of Recommendation Systems

Product Recommendations
Movie Recommendations
News Articles

Populating the Utility Matrix

how to discovery the value users place on items:

We can ask users to rate items.
cons: users are unwilling to do, and so samples are biased by very little fraction of peoples.
We can make inferences from users' behavior.
eg: items purchased/viewed/rated.

9.2 Content-Based Recommendations

9.2.1 Item Profiles

a record representing important characteristics of items.

Discovering Features

for Documents
idea: find the identification of words that characterize the topic of a document.
namely, we expect a sets of words to express the subjects or main ideas of the document.
1. eliminate stop words.
2. compute the TF.DIF score for each reamining word in the document.
3. take as the features of a document the $n$ words with the highest TF.DIF scores.
to measure the similarity of two documents, the distance measures we could use are:
1. Jaccard distance
2. cosine distance
  cosine distance of vectors is not affected by components in which both vectors have 0.
for Images
invite users to tag the items.
cons: users are unwilling to do $\implies$ there are not enough tags (bias).

generalize feature vector

feature is discrete. $\to$ boolean value.
feature is numerical. $\to$ normalization.

9.2.5 User Profiles

create vectors with the same components of item profiles to describe the user's preferences.

It could be derived from utility matrix and item profiles.

normalizate untility matrix. ($[-1,1]$ for cosine distance).
value in user profiles = utility value * corresponding item vectors.



In [3]:

    
# example 9.4

users_name = ['U', 'V']
items_name = ['F{}'.format(x) for x in range(4)]
features_name = ['Julia Roberts', 'others']

# utility matrix
M_uti = pd.DataFrame([
                        [3, 4, 5, 0],
                        [6, 2, 3, 5]  
                     ],
                     index=users_name,
                     columns=items_name
                    )

M_uti



In [4]:

    
# item profile
M_item = pd.DataFrame(index=items_name, columns=features_name)

M_item.loc[:, features_name[0]] = 1
M_item = M_item.fillna(value=0)

M_item









    Out[4]:






  
    
      
      Julia Roberts
      others
    
  
  
    
      F0
      1
      0
    
    
      F1
      1
      0
    
    
      F2
      1
      0
    
    
      F3
      1
      0



In [5]:

    
M_uti.apply(lambda x: x - np.mean(x), axis=1)



In [6]:

    
M_user = M_uti.fillna(value=0).dot(M_item) / 4 #average = sum/len
M_user









    Out[6]:






  
    
      
      Julia Roberts
      others
    
  
  
    
      U
      3
      0
    
    
      V
      4
      0

9.2.6 Recommending Items to Users Based on Content

to estimate:
$$M_{utility}[user, item] = cosineDistant(M_{user}, M_{item})$$

the more similar, the higher probility to recommend.
classification algorithms:
Recommend or Not (machine learning):
one decision per user $\to$ take too long time to construct.
be used only for relatively small problem size.



In [7]:

    
# exercise 9.2.1

raw_data = [
        [3.06, 2.68, 2.92],
        [500, 320, 640],
        [6, 4, 6]
    ]

M_item = pd.DataFrame(raw_data, index=['Processor Speed', 'Disk Size', 'Main-Memory Size'], columns=['A', 'B', 'C'])

# items: A, B, C; features: Processor Speed, Disk Size, ...
M_item









    Out[7]:






  
    
      
      A
      B
      C
    
  
  
    
      Processor Speed
      3.06
      2.68
      2.92
    
    
      Disk Size
      500.00
      320.00
      640.00
    
    
      Main-Memory Size
      6.00
      4.00
      6.00



In [12]:

    
# exercise 9.2.1 

# (d)
M_item.apply(lambda x: x / np.mean(x), axis=1)









    Out[12]:






  
    
      
      A
      B
      C
    
  
  
    
      Processor Speed
      1.060046
      0.928406
      1.011547
    
    
      Disk Size
      1.027397
      0.657534
      1.315068
    
    
      Main-Memory Size
      1.125000
      0.750000
      1.125000



In [13]:

    
# exercise 9.2.2 

# (a)
M_item.apply(lambda x: x - np.mean(x), axis=1)









    Out[13]:






  
    
      
      A
      B
      C
    
  
  
    
      Processor Speed
      0.173333
      -0.206667
      0.033333
    
    
      Disk Size
      13.333333
      -166.666667
      153.333333
    
    
      Main-Memory Size
      0.666667
      -1.333333
      0.666667



In [30]:

    
# exercise 9.2.3

M_uti = pd.DataFrame([[4, 2, 5]], index=['user'], columns=['A', 'B', 'C'])
M_uti



In [31]:

    
# (a)
M_uti_nor = M_uti.apply(lambda x: x - np.mean(x), axis=1)
M_uti_nor



In [32]:

    
# (b)
M_user = M_item.dot(M_uti_nor.T) / 3
M_user









    Out[32]:






  
    
      
      user
    
  
  
    
      Processor Speed
      0.148889
    
    
      Disk Size
      162.222222
    
    
      Main-Memory Size
      1.111111



In [79]:

    
logger.setLevel('WARN')

def create_user_profile(utility_matrix, item_profile):
    """Create user profile by combining utility matrix with item profile in 9.2.5 ."""
    
    assert np.array_equal(utility_matrix.columns, item_profile.columns), \
                      "utility matrix should keep same columns name with item profile."
    
    logger.info('utility_matrix: \n{}\n'.format(utility_matrix))  
    M_uti_notnull = np.ones(utility_matrix.shape)
    M_uti_notnull[utility_matrix.isnull().values] = 0
    logger.info('utility_matrix_isnull: \n{}\n'.format(M_uti_notnull))
    
    
    logger.info('utility_matrix: \n{}\n'.format(item_profile)) 
    M_item_notnull = np.ones(item_profile.shape)
    M_item_notnull[item_profile.isnull().values] = 0
    logger.info('utility_matrix_isnull: \n{}\n'.format(M_item_notnull))
    
    utility_matrix = utility_matrix.fillna(value=0)
    item_profile = item_profile.fillna(value=0)
    M_user = item_profile.dot(utility_matrix.T).values / np.dot(M_item_notnull, M_uti_notnull.T)
    M_user[np.isinf(M_user)] = np.nan   # solve: divide zero
    logger.info('M_user: \n{}\n'.format(M_user))
    
    return pd.DataFrame(M_user, index=item_profile.index, columns=utility_matrix.index)


M_uti = pd.DataFrame([[4, 2, 5], [1, np.nan, 3]], index=['userA', 'userB'], columns=['A', 'B', 'C'])
M_uti_nor = M_uti.apply(lambda x: x - np.mean(x), axis=1)
print('utility matrix: \n{}\n'.format(M_uti_nor))
print('item profile: \n{}\n'.format(M_item))

create_user_profile(M_uti_nor, M_item)









    



utility matrix: 
              A         B         C
userA  0.333333 -1.666667  1.333333
userB -1.000000       NaN  1.000000

item profile: 
                       A       B       C
Processor Speed     3.06    2.68    2.92
Disk Size         500.00  320.00  640.00
Main-Memory Size    6.00    4.00    6.00







    Out[79]:






  
    
      
      userA
      userB
    
  
  
    
      Processor Speed
      0.148889
      -0.07
    
    
      Disk Size
      162.222222
      70.00
    
    
      Main-Memory Size
      1.111111
      0.00

9.3 Collaborative Filtering

identifying similar users and recommending what similar users like.

refine data

rounding the data
eg: rates of 3, 4 and 5 are "1", otherwise "0".
normalizing rates
subtracting from each rating the average rating of that user.



In [68]:

    
# Fig 9.4
M_9_1



In [70]:

    
# rounding the data
M_round = M_9_1.copy()
M_round[M_9_1 <= 2] = np.nan
M_round[M_9_1 > 2] = 1

M_round



In [75]:

    
# normalizing ratings
M_norm = M_9_1.apply(lambda x: x - np.mean(x), axis=1)
M_norm

9.3.2 The Duality of Similarity

We can use information about users to recommend items, whereas even if we find pairs of similar items, it takes an additional step in order to recommend items to users.
- find $n$ similar users $\to$ recommend item $I$ to user $U$.
  normalize the utility matrix first.
  \begin{align}
```
M[U,I] &= Ave(M[U,:]) + Ave(M[0:n,I] - Ave(M[0:n,I])) \\
        &\approx Ave(M[U,:]) + Std(M[0:n,I])
```
  \end{align}
- find $m$ similar items $\to$ recommend item $I$ to user $U$.
  $$M[U,I] = Ave(M[U,0:m])$$
- in order to recommend items to user $U$, we need to find all or most of entry in $M[U,:]$.
  tradeoff:
  1. user-item: find similar users, directly get all predict values of all potent items.
    item-item: find similar items, we need calculate all items one by one (additional step) to fill $M[U,:]$.
  2. item-item similarity often provides more reliable information due to the simplicity of items (genre).
- precompute preferred items for each user.
  utility matrix evolves slowly $\implies$ compute it infrequently and assume that it remains fixed between recomputations.
Items tend to be classifiable in simple terms (eg: genre), whereas the individuals are complex.

9.3.3 Clustering Users and Items

Hierachical approach is prefered:

leave many cluster unmerged at first.
cluster items, and average the corresponding value in utility matrix.
cluster users, and average as well.
repeat several times if we like.

Predict $M[U,I]:

$U \in C$, and $I \in D$.

predict:

\begin{equation}

M[U,I] = \begin{cases}
    M_{revised}[C,D] & \quad \text{if existed.}  \\
    \text{estimate using similar users/items} & \quad \text{otherwise}
\end{cases}

\end{equation}



In [108]:

    
# Fig 9.8

raw_data = [
    [4, 5, np.nan, 5, 1, np.nan, 3, 2],
    [np.nan, 3, 4, 3, 1, 2, 1, np.nan],
    [2, np.nan, 1, 3, np.nan, 4, 5, 3]
]

import string
M_uti = pd.DataFrame(raw_data, index=list(string.uppercase[:3]), columns=list(string.lowercase[:8]))
M_uti



In [140]:

    
logger.setLevel('WARN')
# exercise 9.3.1
from scipy.spatial.distance import jaccard, cosine
from itertools import combinations


def calc_distance_among_matrix(M, func_dis):
    for c in list(combinations(M.index, 2)):
        logger.info('c: {}'.format(c))
        u, v = M.loc[c[0]], M.loc[c[1]]
        logger.info('\n u:{},\n v:{}\n'.format(u.values,v.values))
        print('{} {}: {}'.format(c, func_dis.__name__, func_dis(u,v)))

# (a)
calc_distance_among_matrix(M_uti.notnull(), jaccard)









    



('A', 'B') jaccard: 0.5
('A', 'C') jaccard: 0.5
('B', 'C') jaccard: 0.5



In [141]:

    
# (b)
calc_distance_among_matrix(M_uti.fillna(value=0), cosine)









    



('A', 'B') cosine: 0.398959235991
('A', 'C') cosine: 0.385081306188
('B', 'C') cosine: 0.486129880223



In [142]:

    
# (c)
M_tmp = M_uti.copy()
M_tmp[M_uti < 3] = 0
M_tmp[M_uti >= 3] = 1 
calc_distance_among_matrix(M_tmp, jaccard)









    



('A', 'B') jaccard: 0.714285714286
('A', 'C') jaccard: 0.75
('B', 'C') jaccard: 0.875



In [144]:

    
# (d)
calc_distance_among_matrix(M_tmp.fillna(value=0), cosine)









    



('A', 'B') cosine: 0.42264973081
('A', 'C') cosine: 0.5
('B', 'C') cosine: 0.711324865405



In [146]:

    
# (e)
M_uti_nor = M_uti.apply(lambda x: x - np.mean(x), axis=1)
M_uti_nor



In [148]:

    
# (f)
calc_distance_among_matrix(M_uti_nor.fillna(value=0), cosine)









    



('A', 'B') cosine: 0.415693452532
('A', 'C') cosine: 1.11547005384
('B', 'C') cosine: 1.73957399695



In [149]:

    
# exercise 9.3.2
#todo

9.4 Dimensionality Reduction

UV-decomposition: $$M = U \times V$$

measure: RMSE (root-mean-square error)

Building a Complete UV-Decomposition Algorithm

Preprocessing of the matrix $M$.
normalization:
1. subtract: average rating of user $i$, then average rating of item $j$.
2. subtract: first item, then user.
3. subtract: half of average of item and half of average of user.
Initializing $U$ and $V$.
choice: gives the elements of $UV$ the average of the nonblank elements of $M$.
$\implies$ the element of $U$ and $V$ should be $\sqrt{a/d}$,
where $a$ is the average nonblank element of $M$, $d$ is the lengths of the short sides of $U$ and $V$.

local minima contains global minima:
1. vary the initial values of $U$ and $V$:
  perturb the value $\sqrt{a/d}$ randomly.
2. vary the way we seek the optimum.
Performing the Optimization.
different optimization path:
choose a permutation of the elements and follow that order for every round.

Gradient Descent $\to$ stochastic gradient descent.
Converging to a Minimum.
track the amount of improvement in the RMSE obtained.

stop condition:
1. stop when that improvement in one round falls below a threshold.
2. stop when the maximum improvement during a round is below a threshold.

Avoiding Overfitting

solutions:

optimized by only moving the value of a component a fraction of the way from its current value toward its optimized value.
Stop before the process has converged.
Take several different $UV$ decompositions, and average their predicts.



In [ ]:

    
# exercise 9.4.6

#todo

9.5 The NetFlix Challenge

some facts:

CineMatch was not a very good algorithm.
UV-decomposition alorithm given a 7\% improvement over CineMatch when couped with normalization and a few other tricks.
Combing different algorithms is a preferred strategy.
Genre and other information in IMDB was no useful.
Time of rating turned out to be useful: upward or downward slope with time.

todo: read the papers introduced in the chapter.

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	4	NaN	NaN	5	1	NaN	NaN
B	5	5	4	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	2	4	5	NaN
D	NaN	3	NaN	NaN	NaN	NaN	3

	A	B	C
Processor Speed	3.06	2.68	2.92
Disk Size	500.00	320.00	640.00
Main-Memory Size	6.00	4.00	6.00

	A	B	C
Processor Speed	1.060046	0.928406	1.011547
Disk Size	1.027397	0.657534	1.315068
Main-Memory Size	1.125000	0.750000	1.125000

	A	B	C
Processor Speed	0.173333	-0.206667	0.033333
Disk Size	13.333333	-166.666667	153.333333
Main-Memory Size	0.666667	-1.333333	0.666667

	user
Processor Speed	0.148889
Disk Size	162.222222
Main-Memory Size	1.111111

	userA	userB
Processor Speed	0.148889	-0.07
Disk Size	162.222222	70.00
Main-Memory Size	1.111111	0.00

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	4	NaN	NaN	5	1	NaN	NaN
B	5	5	4	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	2	4	5	NaN
D	NaN	3	NaN	NaN	NaN	NaN	3

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	1	NaN	NaN	1	NaN	NaN	NaN
B	1	1	1	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	NaN	1	1	NaN
D	NaN	1	NaN	NaN	NaN	NaN	1

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	0.666667	NaN	NaN	1.666667	-2.333333	NaN	NaN
B	0.333333	0.333333	-0.666667	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	-1.666667	0.333333	1.333333	NaN
D	NaN	0.000000	NaN	NaN	NaN	NaN	0

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	4	NaN	NaN	5	1	NaN	NaN
B	5	5	4	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	2	4	5	NaN
D	NaN	3	NaN	NaN	NaN	NaN	3

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	4	NaN	NaN	5	1	NaN	NaN
B	5	5	4	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	2	4	5	NaN
D	NaN	3	NaN	NaN	NaN	NaN	3

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	1	NaN	NaN	1	NaN	NaN	NaN
B	1	1	1	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	NaN	1	1	NaN
D	NaN	1	NaN	NaN	NaN	NaN	1

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	4	NaN	NaN	5	1	NaN	NaN
B	5	5	4	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	2	4	5	NaN
D	NaN	3	NaN	NaN	NaN	NaN	3

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	4	NaN	NaN	5	1	NaN	NaN
B	5	5	4	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	2	4	5	NaN
D	NaN	3	NaN	NaN	NaN	NaN	3

	HP1	HP2	HP3	TW	SW1	SW2	SW3
A	1	NaN	NaN	1	NaN	NaN	NaN
B	1	1	1	NaN	NaN	NaN	NaN
C	NaN	NaN	NaN	NaN	1	1	NaN
D	NaN	1	NaN	NaN	NaN	NaN	1