In this notebook we will continue on the first example. After running a training trainSession again in SMURFF, we will look deeper into how to use SMURFF for making predictions.
To make predictions we recall that the value of a tensor model is given by a tensor contraction of all latent matrices. Specifically, the prediction for the element $\hat{Y}_{ijk}$ of a rank-3 tensor is given by
$$ \hat{Y}_{ijk} = \sum_{d=1}^D u^{(1)}_{d,i} u^{(2)}_{d,j} u^{(3)}_{d,k} + mean $$Since a matrix is a rank-2 tensor the prediction for a matrix is given by:
$$ \hat{Y}_{ij} = \sum_{d=1}^D u^{(1)}_{d,i} u^{(2)}_{d,j} + mean $$These inner products are computed by SMURFF automagicaly, as we will see below.
We run a Macau training trainSession using side information (ecfp) from the chembl dataset.
We make sure we save every 10th sample, such that we can load the model afterwards. This run will take some minutes to run.
In [ ]:
import smurff
import os
ic50_train, ic50_test, ecfp = smurff.load_chembl()
os.makedirs("ic50-macau", exist_ok=True)
trainSession = smurff.MacauSession(
Ytrain = ic50_train,
Ytest = ic50_test,
side_info = [ecfp, None],
num_latent = 16,
burnin = 200,
nsamples = 10,
save_freq = 1,
save_prefix= "ic50-macau",
verbose = 1,)
predictions = trainSession.run()
The saved files are indexed in a root ini-file, in this case the root ini-file will be ic50-macau/root.ini.
The content of this file lists all saved info for this training run. For example
[options]
options = ic50-save-options.ini
[steps]
sample_step_10 = sample-10-step.ini
sample_step_20 = sample-20-step.ini
sample_step_30 = sample-30-step.ini
sample_step_40 = sample-40-step.ini
Each step ini-file contains the matrices saved in the step:
[models]
num_models = 2
model_0 = sample-50-U0-latents.ddm
model_1 = sample-50-U1-latents.ddm
[predictions]
pred = sample-50-predictions.csv
pred_state = sample-50-predictions-state.ini
[priors]
num_priors = 2
prior_0 = sample-50-F0-link.ddm
prior_1 = sample-50-F1-link.ddm
In [ ]:
predictor = trainSession.makePredictSession()
print(predictor)
Once we have a PredictSession, there are serveral ways to make predictions:
We can make predictions for all rows $\times$ columns in our matrix
In [ ]:
p = predictor.predict_all()
print(p.shape) # p is a numpy array of size: (num samples) x (num rows) x (num columns)
In [ ]:
p = predictor.predict_some(ic50_test)
print(len(p),"predictions") # p is a list of Predictions
print("predictions 1:", p[0])
In [ ]:
from scipy.sparse import find
(i,j,v) = find(ic50_test)
p = predictor.predict_one((i[0],j[0]),v[0])
print(p)
And plot the histogram of predictions for this element.
In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
# Plot a histogram of the samples.
plt.subplot(111)
plt.hist(p.pred_all, bins=10, density=True, label = "predictions's histogram")
plt.plot(p.val, 1., 'ro', markersize =5, label = 'actual value')
plt.legend()
plt.title('Histogram of ' + str(len(p.pred_all)) + ' predictions')
plt.show()
In [ ]:
import numpy as np
from scipy.sparse import find
(i,j,v) = find(ic50_test)
row_side_info = ecfp.tocsr().getrow(i[0])
p = predictor.predict_one((row_side_info,j[0]),v[0])
print(p)
In [ ]:
# print the U matrices for all samples
for i,s in enumerate(predictor.samples):
print("sample", i, ":", [ (m, u.shape) for m,u in enumerate(s.latents) ])
This will allow us to compute predictions for arbitraty slices of the matrix or tensors using
numpy.einsum:
In [ ]:
sample1 = predictor.samples[0]
(U1, U2) = sample1.latents
## predict the slice Y[7, : ] from sample 1
Yhat_7x = np.einsum(U1[:,7], [0], U2, [0, 2])
## predict the slice Y[:, 0:10] from sample 1
Yhat_x10 = np.einsum(U1, [0, 1], U2[:,0:10], [0, 2])
The two examples above give a matrix (rank-2 tensor) as a result. It is adviced to make predictions on all samples, and average the predictions.
In [ ]:
import smurff
predictor = smurff.PredictSession("ic50-macau/save-root.ini")
print(predictor)
In [ ]: