In this notebook we will continue on the first example. After running a training trainSession again in SMURFF, we will look deeper into how to use SMURFF for making predictions.
To make predictions we recall that the value of a tensor model is given by a tensor contraction of all latent matrices. Specifically, the prediction for the element $\hat{Y}_{ijk}$ of a rank-3 tensor is given by
$$ \hat{Y}_{ijk} = \sum_{d=1}^D u^{(1)}_{d,i} u^{(2)}_{d,j} u^{(3)}_{d,k} + mean $$Since a matrix is a rank-2 tensor the prediction for a matrix is given by:
$$ \hat{Y}_{ij} = \sum_{d=1}^D u^{(1)}_{d,i} u^{(2)}_{d,j} + mean $$These inner products are computed by SMURFF automagicaly, as we will see below.
We run a Macau
training trainSession using side information (ecfp
) from the chembl dataset.
We make sure we save every 10th sample, such that we can load the model afterwards. This run will take some minutes to run.
In [ ]:
import smurff
import os
ic50_train, ic50_test, ecfp = smurff.load_chembl()
os.makedirs("ic50-macau", exist_ok=True)
trainSession = smurff.MacauSession(
Ytrain = ic50_train,
Ytest = ic50_test,
side_info = [ecfp, None],
num_latent = 16,
burnin = 200,
nsamples = 10,
save_freq = 1,
save_prefix= "ic50-macau",
verbose = 1,)
predictions = trainSession.run()
The saved files are indexed in a root ini-file, in this case the root ini-file will be ic50-macau/root.ini
.
The content of this file lists all saved info for this training run. For example
[options]
options = ic50-save-options.ini
[steps]
sample_step_10 = sample-10-step.ini
sample_step_20 = sample-20-step.ini
sample_step_30 = sample-30-step.ini
sample_step_40 = sample-40-step.ini
Each step ini-file contains the matrices saved in the step:
[models]
num_models = 2
model_0 = sample-50-U0-latents.ddm
model_1 = sample-50-U1-latents.ddm
[predictions]
pred = sample-50-predictions.csv
pred_state = sample-50-predictions-state.ini
[priors]
num_priors = 2
prior_0 = sample-50-F0-link.ddm
prior_1 = sample-50-F1-link.ddm
In [ ]:
predictor = trainSession.makePredictSession()
print(predictor)
Once we have a PredictSession
, there are serveral ways to make predictions:
We can make predictions for all rows $\times$ columns in our matrix
In [ ]:
p = predictor.predict_all()
print(p.shape) # p is a numpy array of size: (num samples) x (num rows) x (num columns)
In [ ]:
p = predictor.predict_some(ic50_test)
print(len(p),"predictions") # p is a list of Predictions
print("predictions 1:", p[0])
In [ ]:
from scipy.sparse import find
(i,j,v) = find(ic50_test)
p = predictor.predict_one((i[0],j[0]),v[0])
print(p)
And plot the histogram of predictions for this element.
In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
# Plot a histogram of the samples.
plt.subplot(111)
plt.hist(p.pred_all, bins=10, density=True, label = "predictions's histogram")
plt.plot(p.val, 1., 'ro', markersize =5, label = 'actual value')
plt.legend()
plt.title('Histogram of ' + str(len(p.pred_all)) + ' predictions')
plt.show()
In [ ]:
import numpy as np
from scipy.sparse import find
(i,j,v) = find(ic50_test)
row_side_info = ecfp.tocsr().getrow(i[0])
p = predictor.predict_one((row_side_info,j[0]),v[0])
print(p)
In [ ]:
# print the U matrices for all samples
for i,s in enumerate(predictor.samples):
print("sample", i, ":", [ (m, u.shape) for m,u in enumerate(s.latents) ])
This will allow us to compute predictions for arbitraty slices of the matrix or tensors using
numpy.einsum
:
In [ ]:
sample1 = predictor.samples[0]
(U1, U2) = sample1.latents
## predict the slice Y[7, : ] from sample 1
Yhat_7x = np.einsum(U1[:,7], [0], U2, [0, 2])
## predict the slice Y[:, 0:10] from sample 1
Yhat_x10 = np.einsum(U1, [0, 1], U2[:,0:10], [0, 2])
The two examples above give a matrix (rank-2 tensor) as a result. It is adviced to make predictions on all samples, and average the predictions.
In [ ]:
import smurff
predictor = smurff.PredictSession("ic50-macau/save-root.ini")
print(predictor)
In [ ]: