CHAPTER 4

4.1 Algorithms: Baseline

Now that we have a profound knowledge of the primary datastructure used in bestPy, the question is what to do with it. Obviously, to arrive at a recommendation for a customer, some sort of algorithm needs to operate on that data. To introduce the basic properties of bestPy's algorithms, we will examine the simplest (and fastest) of them all, the Baseline, before we do anything fancy.

Not to be underestimated, though, some sort of baseline algorithm is critical to the recommendation business. Specifically it is needed to provide a

recommendation to new cutomers,
fallback if other algorithms fail,
benchmark for other algorithms to beat.

Preliminaries

We only need this because the examples folder is a subdirectory of the bestPy package.



In [1]:

    
import sys
sys.path.append('../..')

Imports, logging, and data

On top of doing the things we already know, we now need to import also the Baseline algorithm, which is conveniently accessible through the bestPy.algorithms subpackage.



In [2]:

    
from bestPy import write_log_to
from bestPy.datastructures import Transactions
from bestPy.algorithms import Baseline  # Additionally import the baseline algorithm

logfile = 'logfile.txt'
write_log_to(logfile, 20)

file = 'examples_data.csv'
data = Transactions.from_csv(file)

Creating a new `Baseline` object

This is really easy. All you need to do is:



In [3]:

    
algorithm = Baseline()

Inspecting the new recommendation object with Tab completion reveals binarize as a first attribute.



In [4]:

    
algorithm.binarize









    Out[4]:





True

What its default value or True means is that, instead of judging an article's popularity by how many times it was bought, we are only going to count each unique customer only once. How often a given customer bought a given article no longer matters. It's 0 or 1. Hence the attribute's name. You can set it to False if you want to take into account multiple buys by the same customer.

This decision depends on the use case. Do I really like an article more because I bought more than one unit of it? If you sell both consumables and more specialized items, the answer is not so clear. Suppose I bought 6 pairs of socks (which I ended up hating) and one copy of a book (which I ended up loving). Does it really make sense to base your recommendation on the assumption that I liked the socks 6 times as much as the book? Probably not. So this is a case where the default value of True for the binarize attribute might make sense.

If, on the other hand, you are selling consumables only, then the number of times I buy an item might indeed hint towards me liking that item more than others and setting the binarize attribute to False might be adequate.



In [5]:

    
algorithm.binarize = False

Up to you to test and to act accordingly.

An that's it with setting up the configurable parameters of the Baseline algorithm. Without data, there is nothing else we can do for now, other than convincing us that there is indeed no data associated with the algorithm yet.



In [6]:

    
algorithm.has_data









    Out[6]:





False

Attaching data to the `Baseline` algorithm

To let the algorithm act on our data, we call its operating_on() method, which takes a data object of type Transactions as argument. Inspecting the has_data attribute again tells us whether we were successful or not.



In [7]:

    
recommendation = algorithm.operating_on(data)
recommendation.has_data









    Out[7]:





True

Note: Of course, you can also directly instantiate the algorithm with data attached

recommendation = Baseline().operating_on(data)

and configure its parameters (the binarize attribute) later.

Making a baseline recommendation

Now that we have data attached to our algorithm, Tab completion shows us that an additional method for_one() has mysteriously appeared. This method, which does not make any sense without data and was, therefore, hidded before, returns an array of numbers, one for each article with the first for the article with index 0, the next for the article with index 1, etc. The highest number indicates the most and the lowest the least recommended article.



In [8]:

    
recommendation.for_one()









    Out[8]:





array([  1.,   1.,  10., ...,   2.,   1.,   1.])

As discussed above, these numbers correpsond to either the count of unique buyers or the count of buys, depending on whether the attribute binarize is set to True or False, respectively.



In [9]:

    
recommendation.binarize = True
recommendation.for_one()









    Out[9]:





array([ 1.,  1.,  8., ...,  1.,  1.,  1.])

An that's all for the baseline algorithm

Remark on the side

What actually happens when you try to set the attrribute binarize to something else than the boolean values True or False? Let's try!



In [10]:

    
recommendation.binarize = 'foo'









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-36d7da089842> in <module>()
----> 1 recommendation.binarize = 'foo'

/home/georg/Documents/Python/BestPy/bestPy/algorithms/baselines/baseline.py in binarize(self, binarize)
     49     @binarize.setter
     50     def binarize(self, binarize):
---> 51         self.__check_boolean_type_of(binarize)
     52         if binarize != self.binarize:
     53             self.__delete_precomputed()

/home/georg/Documents/Python/BestPy/bestPy/algorithms/baselines/baseline.py in __check_boolean_type_of(binarize)
    118         if not isinstance(binarize, bool):
    119             log.error('Attempt to set "binarize" to non-boolean type.')
--> 120             raise TypeError('Attribute "binarize" must be True or False!')
    121 
    122     @staticmethod

TypeError: Attribute "binarize" must be True or False!

And that's an error! If you examine your logfile, you should find the according line there.

[ERROR   ]: Attempt to set "binarize" to non-boolean type. (baseline|__check_boolean_type_of)

Remember to check you logfile every once in a while to see what's going on!



In [ ]: