In [1]:
import numpy as np
import data
The following functions get the dataset, and save it to a local file, and parse it into sparse matrices we can pass into LightFM
.
In particular, _build_interaction_matrix
constructs the interaction matrix: a (no_users, no_items) matrix with 1 in place of positive interactions, and -1 in place of negative interactions. For this experiment, any rating lower than 4 is a negative rating.
In [2]:
import inspect
In [3]:
print(inspect.getsource(data._build_interaction_matrix))
Let's run it! The dataset will be automatically downloaded and processed.
In [4]:
train, test = data.get_movielens_data()
Let's check the matrices.
In [5]:
train
Out[5]:
In [6]:
test
Out[6]:
Looks good and ready to go.
In [7]:
from lightfm import LightFM
In [8]:
model = LightFM(no_components=30)
In this case, we set the latent dimensionality of the model to 30. Fitting is straightforward.
In [9]:
model.fit(train, epochs=50)
Out[9]:
Let's try to get a handle on the model accuracy using the ROC AUC score.
In [10]:
from sklearn.metrics import roc_auc_score
train_predictions = model.predict(train.row,
train.col)
In [11]:
train_predictions
Out[11]:
In [12]:
roc_auc_score(train.data, train_predictions)
Out[12]:
We've got very high accuracy on the train dataset; let's check the test set.
In [13]:
test_predictions = model.predict(test.row, test.col)
In [14]:
roc_auc_score(test.data, test_predictions)
Out[14]:
The accuracy is much lower on the test data, suggesting a high degree of overfitting. We can combat this by regularizing the model.
In [15]:
model = LightFM(no_components=30, user_alpha=0.0001, item_alpha=0.0001)
model.fit(train, epochs=50)
roc_auc_score(test.data, model.predict(test.row, test.col))
Out[15]:
A modicum of regularization gives much better results.
The promise of lightfm
is the possibility of using metadata in cold-start scenarios. The Movielens dataset has genre data for the movies it contains. Let's use that to train the LightFM
model.
The get_movielens_item_metadata
function constructs a (no_items, no_features) matrix containing features for the movies; if we use genres this will be a (no_items, no_genres) feature matrix.
In [16]:
item_features = data.get_movielens_item_metadata(use_item_ids=False)
item_features
Out[16]:
We need to pass these to the fit
method in order to use them.
In [17]:
model = LightFM(no_components=30, user_alpha=0.0001, item_alpha=0.0001)
model.fit(train, item_features=item_features, epochs=50)
roc_auc_score(test.data, model.predict(test.row, test.col, item_features=item_features))
Out[17]:
This is not as accurate as a pure collaborative filtering solution, but should enable us to make recommendations new movies.
If we add item-specific features back, we should get the original accuracy back.
In [18]:
item_features = data.get_movielens_item_metadata(use_item_ids=True)
item_features
model = LightFM(no_components=30, user_alpha=0.0001, item_alpha=0.0001)
model.fit(train, item_features=item_features, epochs=50)
roc_auc_score(test.data, model.predict(test.row, test.col, item_features=item_features))
Out[18]:
So far, we have been treating the signals from the data as binary explicit feedback: either a user likes a movie (score >= 4) or does not. However, in many applications feedback is purely implicit: the items a user interacted with are positive signals, but we have no negative signals.
lightfm
implements two models suitable for dealing with this sort of data:
[1] Rendle, Steffen, et al. "BPR: Bayesian personalized ranking from implicit feedback." Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2009.
[2] Weston, Jason, Samy Bengio, and Nicolas Usunier. "Wsabie: Scaling up to large vocabulary image annotation." IJCAI. Vol. 11. 2011.
Before using them, let's first load the data and define some evaluation functions.
In [19]:
train, test = data.get_movielens_data()
train.data = np.ones_like(train.data)
test.data = np.ones_like(test.data)
In [20]:
from sklearn.metrics import roc_auc_score
def precision_at_k(model, ground_truth, k):
"""
Measure precision at k for model and ground truth.
Arguments:
- lightFM instance model
- sparse matrix ground_truth (no_users, no_items)
- int k
Returns:
- float precision@k
"""
ground_truth = ground_truth.tocsr()
no_users, no_items = ground_truth.shape
pid_array = np.arange(no_items, dtype=np.int32)
precisions = []
for user_id, row in enumerate(ground_truth):
uid_array = np.empty(no_items, dtype=np.int32)
uid_array.fill(user_id)
predictions = model.predict(uid_array, pid_array, num_threads=4)
top_k = set(np.argsort(-predictions)[:k])
true_pids = set(row.indices[row.data == 1])
if true_pids:
precisions.append(len(top_k & true_pids) / float(k))
return sum(precisions) / len(precisions)
def full_auc(model, ground_truth):
"""
Measure AUC for model and ground truth on all items.
Arguments:
- lightFM instance model
- sparse matrix ground_truth (no_users, no_items)
Returns:
- float AUC
"""
ground_truth = ground_truth.tocsr()
no_users, no_items = ground_truth.shape
pid_array = np.arange(no_items, dtype=np.int32)
scores = []
for user_id, row in enumerate(ground_truth):
uid_array = np.empty(no_items, dtype=np.int32)
uid_array.fill(user_id)
predictions = model.predict(uid_array, pid_array, num_threads=4)
true_pids = row.indices[row.data == 1]
grnd = np.zeros(no_items, dtype=np.int32)
grnd[true_pids] = 1
if len(true_pids):
scores.append(roc_auc_score(grnd, predictions))
return sum(scores) / len(scores)
Now let's train a BPR model and look at its accuracy.
In [21]:
model = LightFM(learning_rate=0.05, loss='bpr')
model.fit_partial(train,
epochs=10)
train_precision = precision_at_k(model,
train,
10)
test_precision = precision_at_k(model,
test,
10)
train_auc = full_auc(model, train)
test_auc = full_auc(model, test)
print('Precision: %s, %s' % (train_precision, test_precision))
print('AUC: %s, %s' % (train_auc, test_auc))
The WARP model, on the other hand, optimises for precision@k---we should expect its performance to be better on precision.
In [22]:
model = LightFM(learning_rate=0.05, loss='warp')
model.fit_partial(train,
epochs=10)
train_precision = precision_at_k(model,
train,
10)
test_precision = precision_at_k(model,
test,
10)
train_auc = full_auc(model, train)
test_auc = full_auc(model, test)
print('Precision: %s, %s' % (train_precision, test_precision))
print('AUC: %s, %s' % (train_auc, test_auc))
And that is exactly what we see: we get much higher precision@10 (but the AUC metric is also improved).
In [ ]:
In [ ]: