Overengineered Zone Bar Prediction

At our mutual place of employ, a co-worker started keeping track of what flavors of zone bars were being put in the snack basket, and was curious to see if, given the history of bars, he could predict what the future might look like.

I was inspired to try and model this with a Markov chain, which I'd never used in 'real life' before, but had read about.

There are 2 kinds of bars:

FG: Fudge Graham
CPB: Chocolate Peanut butter.

The file data.txt has a list of bars from each day he checked the basket.

To run this, you'll need to grab the python markov chain library. It needs to be installed from git, as per the readme.



In [3]:

    
import pykov

We need to calculate how often a certain transition happens. Given a bar type, what are the odds of it switching types? The first thing that requires is calculating that (which seems like it'd be a function of pykov, and probably is? But we can calculate by hand.)

First step: Load all the bars into a list.



In [4]:

    
states = [x.strip() for x in open('data.txt').readlines()]
states[0:3]









    Out[4]:





['FG', 'FG', 'FG']

So now we need to calulate how often certain kinds of transitions happen. Thankfully, there is an itertools recipe for doing this in the documentation. Given an iterable, it yeilds pairs, as described.



In [5]:

    
from itertools import tee, izip
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

Now, given each pair of bars, I get a raw tally of how many times it transitions from the first bar to the second bar.



In [6]:

    
counts = {}
for start, end in pairwise(states):
    counts[start] = counts.get(start, {})
    counts[start][end] = counts[start].get(end, 0) + 1
counts









    Out[6]:





{'CPB': {'CPB': 11, 'FG': 2}, 'FG': {'CPB': 3, 'FG': 8}}

pykov doesn't want counts, though, it wants a probability. "X% of the time state A transitions to state B, and Y% of the time it transitions from state A to C, and so on.

Now we need to figure out normalized pairs that give the probability of each transition. So ('CPB', 'CPB'): .84 can be read as "84% of the time if it's a chocolate peanut butter bar, it'll be a chocolate peanut bar the next time."



In [7]:

    
transitions = {}
for start in counts:
    total = sum(counts[start].values())
    for end in counts[start]:
        transitions[(start,end)] = float(counts[start][end]) / total
transitions









    Out[7]:





{('CPB', 'CPB'): 0.8461538461538461,
 ('CPB', 'FG'): 0.15384615384615385,
 ('FG', 'CPB'): 0.2727272727272727,
 ('FG', 'FG'): 0.7272727272727273}

This is the structure pykov.Chain() wants, so we can just create a new chain from it:



In [8]:

    
t = pykov.Chain(transitions)
t









    Out[8]:





Chain([(('CPB', 'FG'), 0.15384615384615385), (('FG', 'FG'), 0.7272727272727273), (('FG', 'CPB'), 0.2727272727272727), (('CPB', 'CPB'), 0.8461538461538461)])

Now, given the chain and the current state, we can see an example of what the next 15 bars may look like. Because it's random, if the cell is re-run multiple times, it'll be a different walk each time.



In [9]:

    
t.walk(15, 'CPB')









    Out[9]:





['CPB',
 'CPB',
 'CPB',
 'CPB',
 'CPB',
 'CPB',
 'CPB',
 'CPB',
 'CPB',
 'CPB',
 'FG',
 'FG',
 'FG',
 'FG',
 'FG',
 'CPB']

If you were really hoping for a rush of the fudge bars, you can model out what the relative possibilities of various scenarios are, too:



In [22]:

    
import math

for scenario in [['CPB'] * 3, ['FG'] * 3, ['FG', 'CPB', 'FG'], ['FG']*7, ['CPB']*7]:
    print "{}: {}".format(scenario, math.exp(t.walk_probability(scenario)))









    



['CPB', 'CPB', 'CPB']: 0.715976331361
['FG', 'FG', 'FG']: 0.528925619835
['FG', 'CPB', 'FG']: 0.041958041958
['FG', 'FG', 'FG', 'FG', 'FG', 'FG', 'FG']: 0.14797345392
['CPB', 'CPB', 'CPB', 'CPB', 'CPB', 'CPB', 'CPB']: 0.367025295594

You can see that streaks of CPB of any length are more likely that streaks of FG, but neither is terribly improbable; but flip flops between types of bars is really unlikely.

I'm not sure exactly what the units are here, clearly it's not strict probability.