Now that we have a convenient way to make recommendations and a convenient way to split our data into training and test sets, it is straightforward to benchmark our algorithms and find both the best one and the best settings for it.
For that purpose, bestPy
provides the Benchmark
class, which brings together a fully configured recommender (i.e., a RecoBasedOn
instance) and the test data to provide a score for the algorithm used in the recommender. Let's see how that works.
We only need this because the examples folder is a subdirectory of the bestPy
package.
In [1]:
import sys
sys.path.append('../..')
In addition to what we did in the last notebook, we now import also the Benchmark
class from the top-level package. As an algorithm, we are going to chose CollaborativeFiltering
and import also a two similarities for comparison. Importantly, we also need the Baseline
to establish a basic score that any algorithm worthy of the name should beat.
In [2]:
from bestPy import write_log_to
from bestPy.datastructures import TrainTest
from bestPy import Benchmark, RecoBasedOn # Additionally import Benchmark
from bestPy.algorithms import Baseline, CollaborativeFiltering # Import also Baseline for score to beat
from bestPy.algorithms.similarities import kulsinski, cosine # Import two similarities for comparison
logfile = 'logfile.txt'
write_log_to(logfile, 20)
file = 'examples_data.csv'
data = TrainTest.from_csv(file)
TrainTest
data and set up a recommender with the Baseline
algorithmLet's stick with holding out the last 4 unique purchases from each customer and let's say we want to recommend also articles that customers bought before. After splitting the data accordingly, we are going to set up a recommender with the training data and the Baseline
as algorithm.
In [3]:
data.split(4, False)
algorithm = Baseline()
recommender = RecoBasedOn(data.train).using(algorithm).keeping_old
In [4]:
benchmark = Benchmark(recommender)
Tab inspection tells us that the newly created object has a single method against()
. In order to provide a benchmark score for our recommender, which was trained on the training data, we also need the held out test data to test it against. So the argument of against()
is the held-out test data and its return value is the benchmark object, now with test data attached.
In [5]:
benchmark = benchmark.against(data.test)
The beauty of this peculiar way of calling the against()
method is again revealed when combining it with instantiation of the Benchmark
class into a single, elegant line of code that reads like an instruction in natural language.
In [6]:
benchmark = Benchmark(recommender).against(data.test)
In [7]:
benchmark.score
Out[7]:
It tells us that, on average, customers actually bought about 0.12 articles out of the 4 we recommended. And that's just the baseline recommendation that does not at all take into account differences between customers. Any algorithm that does should easily beat that number. Let's try.
In [8]:
algorithm = CollaborativeFiltering()
algorithm.binarize = False
algorithm.similarity = cosine
recommender = RecoBasedOn(data.train).using(algorithm).pruning_old
benchmark = Benchmark(recommender).against(data.test)
benchmark.score
Out[8]:
Indeed, using collaborative filtering, we could significantly improve over the customer-agnostic baseline and get about 0.46 articles out of 4 right. But maybe we should not count each individual purchase and, instead, just consider whether a customer bought an article or not.
In [9]:
algorithm.binarize= True
benchmark.score
Out[9]:
That improved our score a little. But we also have other knobs to turn. How about trying a different way of measuring similarity between articles?
In [10]:
algorithm.similarity = kulsinski
benchmark.score
Out[10]:
Better again! You can clearly see where this is going. How high can you get the score? Happy exploring!
NOTE: You may have realized that, when we split the data, we told bestPy
that we would recommend back to customers also articles that they bought before.
data.split(4, False)
When setting up our recommender with collaborative filtering, however, we wrote
recommender = RecoBasedOn(data.train).using(algorithm).pruning_old
meaning that we would not, in fact recommend previously bought articles. That's a clear contradiction. Which one is it going to be? Because, in this context, the way we have split the data determines the way we have to test it, the value of the only_new
attrribute of the test data takes precedence over the value of the same attribute of the recommender object.
In [11]:
recommender = RecoBasedOn(data.train).using(algorithm).pruning_old
print('Recommender before:', recommender.only_new)
print('Test data:', data.test.only_new)
benchmark = Benchmark(recommender).against(data.test)
print('Recommender after:', recommender.only_new)
This reset conveniently takes place behind the scenes but, to keep you informed of what's happening at all times, it is of course logged.
[INFO ]: Resetting recommender to "keeping_old" because of test-data preference. (benchmark|against)
And this concludes our discussion of how to benchmark and tune bestPy
's algorithms.
In [ ]: