This notebook demonstrates the most basic capabilities of the pyvw python->VW interface. The inferface (unlike the rest of VW :P) is extensively documented, so if you are confused, look at the python docs!

Any pyvw application needs to begin by importing pyvw.


In [1]:
from __future__ import print_function
from vowpalwabbit import pyvw

One we've imported pyvw, we can initialize VW either by passing a command line string (e.g., "--quiet -q ab --l2 0.01") or, in a more python-friendly manner, providing those as named arguments. Here we do the latter.


In [2]:
vw = pyvw.vw(quiet=True, q='ab', l2=0.01)

VW objects can do a lot, but the most important thing the can do is create examples and train/predict on those examples.

One way to create an example is to pass a string. This is the equivalent of a string in a VW file. For instance:


In [3]:
ex = vw.example('1 |a two features |b more features here')

As promised there is documentation; for instance:


In [4]:
help(ex.learn)


Help on method learn in module vowpalwabbit.pyvw:

learn() method of vowpalwabbit.pyvw.example instance
    Learn on this example (and before learning, automatically
    call setup_example if the example hasn't yet been setup).

Let's run that learn function and get a prediction:


In [5]:
ex.learn()
print('current prediction =', ex.get_updated_prediction())


current prediction = 0.8230039477348328

Here, get_updated_prediction retrieves the prediction made internally during learning. The "updated" aspect means "if I were to make a prediction on this example after this call to learn, what would that prediction be?"

Okay, so the prediction isn't quite where we want it yet. Let's learn a few more times and then print the prediction.


In [6]:
ex.learn() ; ex.learn() ; ex.learn() ; ex.learn()
print('current prediction =', ex.get_updated_prediction())


current prediction = 0.9992850422859192

This is now quite a bit closer to what is desired.

Now let's create a new example using the other form of example creation: python dictionaries. Here, you must provide a dictionary that maps namespaces (eg, 'a' and 'b') to lists of features. Features can either be strings (eg "foo"), or pairs of string/floats (eg ("foo", 0.5)). We'll create an example that's similar, but not identical to, the previous example to see how well VW has generalized.

Note that in this setup there is no label provided, which means that this will be considered a test example.


In [7]:
ex2 = vw.example({ 'a': ['features'], 'b': ['more', 'features', 'there'] })

Given this example, we execute learn. But since it's a test example (no label), this will only make a prediction!


In [8]:
ex2.learn()
print('current prediction =', ex2.get_simplelabel_prediction())


current prediction = 0.4984472393989563

Because this is a test example, we can get the raw prediction with get_simplelabel_prediction(). This is simplelabel because it's a regression problem. If we were doing, for instance, One-Against-All multiclass prediction, we would use get_multiclass_prediction, etc.

This prediction is only about half of what we want, but we're also missing a number of features.

Let's now give this example a label and train on it a few times:


In [9]:
ex2.set_label_string('-2.0')
ex2.learn() ; ex2.learn() ; ex2.learn() ; ex2.learn() ; ex2.learn()
print('current prediction =', ex2.get_simplelabel_prediction())


current prediction = -1.4838640689849854

Now we can go back and see how this has affected the prediction behavior on the original example ex. We do this first by removing the label and then calling learn to make a prediction.


In [10]:
ex.set_label_string('')
ex.learn()
print('current prediction =', ex.get_simplelabel_prediction())


current prediction = -0.5934292078018188

Clearly this has had an impact on the prediction for the first example. Let's put the label back and then iterate between learning on ex and ex2:


In [11]:
ex.set_label_string('1')
for i in range(10):
    ex.learn()
    ex2.learn()
    print('ex prediction =', ex.get_updated_prediction())
    print('ex2 prediction =', ex2.get_updated_prediction())


ex prediction = 0.6259561777114868
ex2 prediction = -1.4387876987457275
ex prediction = 0.7280516624450684
ex2 prediction = -1.52095365524292
ex prediction = 0.7903025150299072
ex2 prediction = -1.5942585468292236
ex prediction = 0.8319660425186157
ex2 prediction = -1.6564759016036987
ex prediction = 0.8618472218513489
ex2 prediction = -1.7080860137939453
ex prediction = 0.8843045234680176
ex2 prediction = -1.7504198551177979
ex prediction = 0.9017017483711243
ex2 prediction = -1.784990668296814
ex prediction = 0.9154171943664551
ex2 prediction = -1.813109278678894
ex prediction = 0.9263675212860107
ex2 prediction = -1.835953950881958
ex prediction = 0.935177206993103
ex2 prediction = -1.8545048236846924

After a handful of updates, we can see that the prediction for ex is going back toward 1.0 and for ex2 back toward -2.0.

Now that we're done, it's safest to tell VW that we're done with these examples and that it can garbage collect them. (This should happen by default when they pass out of scope per Python's build in garbage collector, but that may not run soon enough if you're manipulating large numbers of examples at once!)


In [12]:
ex.finish()
ex2.finish()

Finally, when we're done with VW entirely, or perhaps want to start up a new VW instance, it's good behavior to close out any old ones. This is especially important if we wanted to save a model to disk: calling vw.finish() tells it to write the file. You can add f='mymodel' to the initialization line of the vw object if you want to play around with this!


In [13]:
vw.finish()

This is the end of the intro. For more, look at test.py in the python directory of the VW distribution; this contains some more examples. For even more, look at the python docs in pyvw.py, for instance help(pyvw.vw) and so on!

Happy VW-Pythoning!