CHAPTER 4

4.3 Algorithms: Truncated SVD

The second main algorithm implemented in bestPy is truncated singular-value decomposition (SVD). Briefly, the underlying assumption of this algorithm is that there exists, in fact, a relatively small number of hidden features (or factors) that characterize each article. Each cutomer is, in turn, charaterized by preferences for each of these factors. We do not know what they are and neither do we need to. All we need to do is to chose how many there are of them. The number of hidden (or latent) factors is thus the main parameter of the algorithm. Too few and we cannot fully express the details of cutomer preferences, too many and we will end up recommending only articles that the customer has already bought.

But enough theory for now. Let's dive right in and see how it works.

Preliminaries

We only need this because the examples folder is a subdirectory of the bestPy package.



In [1]:

    
import sys
sys.path.append('../..')

Imports, logging, and data

On top of doing the things we already know, we now import the TruncatedSVD algorithm, which is (surprise!) accessible through the bestPy.algorithms subpackage.



In [2]:

    
from bestPy import write_log_to
from bestPy.datastructures import Transactions
from bestPy.algorithms import TruncatedSVD  # Import TruncatedSVD

logfile = 'logfile.txt'
write_log_to(logfile, 20)

file = 'examples_data.csv'
data = Transactions.from_csv(file)

Creating a new `TruncatedSVD` object

We already know how to do this. Let's not attach data to the algorhtm right away, though.



In [3]:

    
algorithm = TruncatedSVD()
algorithm.has_data









    Out[3]:





False

Parameters of the collaborative filtering algorithm

Inspecting the new algorithm object with Tab completion again reveals binarize as a first attribute.



In [4]:

    
algorithm.binarize









    Out[4]:





True

It has the same meaning as in the baseline recommendation and in collaborative filtering: True means we only care whether or not a customer bought an article and False means we also take into account how often a customer bought an article.

The second parameter is the number of latent factors described above.



In [5]:

    
algorithm.number_of_factors









    Out[5]:





20

If we don't like the default value, we can always set a different one.



In [6]:

    
algorithm.number_of_factors = 35
algorithm.number_of_factors









    Out[6]:





35

Now, let's attach the data. Again relying on Tab completion, we see that the additional attribute max_number_of_factors magically appeared.



In [7]:

    
recommendation = algorithm.operating_on(data)
recommendation.max_number_of_factors









    Out[7]:





8254

As the name implies this is the maximum number of latent factors that we can set. It turns out that we cannot choose more than the number of customers or the number of articles in our dataset, whichever is smaller.



In [8]:

    
print(data.user.count)
print(data.item.count)
print(data.matrix.min_shape)

Obviously, we do not know this number before we attached the data to the algorithm. So what happens if we first set it too large, say, to 8300?



In [9]:

    
algorithm = TruncatedSVD()
algorithm.number_of_factors = 8300
recommendation = algorithm.operating_on(data)
recommendation.number_of_factors









    Out[9]:





8254

Without making too much fuzz about it, the number of factors has been reset to the maximum allowed. All you see about this behind-the-scene magic is an additional line in the logfile.

[WARNING]: Requested 8300 latent features, but only 8254 available. Resetting to 8254. (truncatedsvd|__reset)

And that's it for the parameters of the truncated SVD algorithm.

Making a recommendation for a target customer

Now that everything is set up and we have data attached to the algorithm, its for_one() method is available and can be called with the internal integer index of the target customer as argument. Before we do that, however, let's reset the number of latent factors to something meaningful. Set to the maximum, we are just recommending back to the customer what he or she has bought before.



In [10]:

    
recommendation.number_of_factors = 35

customer = data.user.index_of['5']
recommendation.for_one(customer)









    Out[10]:





array([  1.38927510e-19,   2.15147560e-05,   3.58098535e-04, ...,
        -3.22332684e-19,  -2.27175112e-19,   2.31783771e-19])

And, voilà, your recommendation. Again, a higher number means that the article with the same index as that number is more highly recommended for the target customer. Feel free to play around with the number of factors and see what happens to the recommendation!

This concludes our discussion ot the truncated SVD algorithm.



In [ ]: