The second main algorithm implemented in bestPy
is truncated singular-value decomposition (SVD). Briefly, the underlying assumption of this algorithm is that there exists, in fact, a relatively small number of hidden features (or factors) that characterize each article. Each cutomer is, in turn, charaterized by preferences for each of these factors. We do not know what they are and neither do we need to. All we need to do is to chose how many there are of them. The number of hidden (or latent) factors is thus the main parameter of the algorithm. Too few and we cannot fully express the details of cutomer preferences, too many and we will end up recommending only articles that the customer has already bought.
But enough theory for now. Let's dive right in and see how it works.
We only need this because the examples folder is a subdirectory of the bestPy
package.
In [1]:
import sys
sys.path.append('../..')
In [2]:
from bestPy import write_log_to
from bestPy.datastructures import Transactions
from bestPy.algorithms import TruncatedSVD # Import TruncatedSVD
logfile = 'logfile.txt'
write_log_to(logfile, 20)
file = 'examples_data.csv'
data = Transactions.from_csv(file)
In [3]:
algorithm = TruncatedSVD()
algorithm.has_data
Out[3]:
In [4]:
algorithm.binarize
Out[4]:
It has the same meaning as in the baseline recommendation and in collaborative filtering: True
means we only care whether or not a customer bought an article and False
means we also take into account how often a customer bought an article.
The second parameter is the number of latent factors described above.
In [5]:
algorithm.number_of_factors
Out[5]:
If we don't like the default value, we can always set a different one.
In [6]:
algorithm.number_of_factors = 35
algorithm.number_of_factors
Out[6]:
Now, let's attach the data. Again relying on Tab completion, we see that the additional attribute max_number_of_factors
magically appeared.
In [7]:
recommendation = algorithm.operating_on(data)
recommendation.max_number_of_factors
Out[7]:
As the name implies this is the maximum number of latent factors that we can set. It turns out that we cannot choose more than the number of customers or the number of articles in our dataset, whichever is smaller.
In [8]:
print(data.user.count)
print(data.item.count)
print(data.matrix.min_shape)
Obviously, we do not know this number before we attached the data to the algorithm. So what happens if we first set it too large, say, to 8300?
In [9]:
algorithm = TruncatedSVD()
algorithm.number_of_factors = 8300
recommendation = algorithm.operating_on(data)
recommendation.number_of_factors
Out[9]:
Without making too much fuzz about it, the number of factors has been reset to the maximum allowed. All you see about this behind-the-scene magic is an additional line in the logfile.
[WARNING]: Requested 8300 latent features, but only 8254 available. Resetting to 8254. (truncatedsvd|__reset)
And that's it for the parameters of the truncated SVD algorithm.
Now that everything is set up and we have data attached to the algorithm, its for_one()
method is available and can be called with the internal integer index of the target customer as argument. Before we do that, however, let's reset the number of latent factors to something meaningful. Set to the maximum, we are just recommending back to the customer what he or she has bought before.
In [10]:
recommendation.number_of_factors = 35
customer = data.user.index_of['5']
recommendation.for_one(customer)
Out[10]:
And, voilà, your recommendation. Again, a higher number means that the article with the same index as that number is more highly recommended for the target customer. Feel free to play around with the number of factors and see what happens to the recommendation!
This concludes our discussion ot the truncated SVD algorithm.
In [ ]: