Coffee Consumption - Exercises

Quentin CAUDRON
Ecology and Evolutionary Biology
qcaudron@princeton.edu
@QuentinCAUDRON

This data describes the consumption of coffee at the espresso machine in the Centre for Complexity Science, at the University of Warwick. The data collection exercise was facilitated by an Android app which automatically timestamped incoming data, and added the submitting user's name. In theory, whenever contributors ( thanks, guys ! ) had a coffee, they would query the machine as to how many coffees it had made to date, and submit the count via the Android app. In practice, the small number of contributers means that the physical process is poorly and irregularly sampled, full of holes, and not of the best quality. Concisely put, this is real data, my worst nightmare. Hey, at least it's strictly monotonic...

The purpose of this notebook is to perform some basic time series analysis on the data using primarily the functionality of Numpy, Scipy, and Matplotlib. A few other cool things might be thrown in for fun.

Exercises

Go to the documentation for any function you're using. The docs are great, and contain examples as well as the different function overloads.

Below is a list of suggested exercises to do, with a little guidance in terms of how to do them. Below that are my solutions to these problems. So, a list of things to try :

Reading the Data

FASTEST : Using Pickle

For speed, I've got a Pickle file ready to be imported. This is for today's session, and, of course, won't be available when you want to actually import your own data. Still, for speed today, if you just want access to the data, call :

with open("pickled_coffee.p") as f :

times, coffees, names, day_of_week, time_seconds, hour_of_day = pickle.load(f)

Note the indent ! The variables are now each column of the .csv. In the code that follows, you'll need to replace the Pandas object with the relevant variable. Here, time_seconds is the timestamp of the datapoint, in seconds from 1 Jan, 1970, 00:00 GMT ( the Epoch ).

BEST : Using Pandas

Use read_csv(filename), where filename is a string. This will generate a Pandas DataFrame object.
Check the .columns variable for a list of column names, or simply call .head() for a preview.
You can access DataFrame columns by using ["Column_Name"] after the object.
With a column, .values returns a Numpy array.

FOR MASOCHISTS : Using native Python, one method

Open the file using myfile = open(filename)
Iterate over the file using for line in myfile, which will give you a string.
Strings have a handy .split(",") method, which splits the line at a comma.
You'll need to ignore the first line, which contains headers.

ALSO MASOCHISTIC : Using native Python, another method :

Use Numpy's genfromtxt method to read the whole file into a variable all at once.
There's a handy skip_header argument to lose the row of headers.
You should also specify the delimiter.
Finally, enforce dtype = None to get Python to guess the type of variable per column.
The result is a list of tuples, not a list of lists.

Plotting the Data

First thing's first : plot the data.
After import matplotlib.pyplot as plt, you can call plt.plot(x, y) to plot a basic graph.
You can use the column entitled "Time_Seconds" if you want easy access to the time variable.
Otherwise, you'll need to play with datetime objects ( oh joy ), see next section.

Datetime Objects - skip if you're just using Time_Seconds

Datetime objects are specially-formatted to contain dates and times. They're typically the integer number of seconds from 1 Jan, 1970.
The datetime module in Python contains a datetime class, so if you're using import datetime as dt, then you can call dt.datetime
This class has the method strptime, which produces a datetime object from a string of text. You provide the format. For converting something like "17/03/2012 18:23", you have the format string "%d/%m/%Y %H:%M". If you give strptime the datetime string and the format string, it can understand that string and produce a datetime object.
From that, you can call that datetime object's strftime method, which asks for a formatting string like the one above. If you just give it "%H", for example, it will just give you the hour of that datetime. Using these, you can extract a list of days of the week and of hours of the day from each time entry. See http://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior for a table of formatting strings.

Histograms

How is the data distributed ? Plot some histograms. How many coffees are consumed in each day of the week, and in each hour of the day ?
plt.hist() takes a list or array, and produces a histogram.
If you also give it another argument ( plt.hist(mylist, mybins) ), then you've provided the boundaries between histogram bins, assuming mybins is a list or array. If it's a scalar, then you're telling it how many bins to use, and it calculates the boundaries itself.

Critical Error - Days without Coffee ?!

Use the function diff() in Numpy to look at the difference in time between measurements. Plot these differences.
If you spot a large number of days without a measurement, then you've found a time when the coffee machine was broken. This causes critical failure in Warwick Complexity. How many such failures were there ? Look into np.diff(t) to find out.
Generate a histogram to check out the distribution of days between measurements.

Always Acknowledge your Coworkers

Which Complexity heros contributed data ? Let's make a pie chart.
plt.pie() takes a list or array, and produces a pie chart.
The second argument you feed it contains the labels of each slice of pie - so plt.pie(mydata, mylabels) will put labels next to each slice. Generate a pie chart, showing who contributed to this time series.

Computing an Average Sampling Rate

If you plot time on the y axis ( just against indices ), and on the same plot, have a straight line of gradient 1, you'll note that most of our time measurements ( the times at which we took measurements ) exist under this line. That means that the average derivative of the time curve is less than one. This is good news - it means that, on average, less than one day goes by between measurements. The reciprocal is the sampling rate - the number of measurements taken per unit time.
Calculate this sampling rate by using scipy.stats.linregress to perform a linear regression. Catch the gradient and intercept, and plot it on top of the time plot, so we can visually check the fit. Also catch the $R^2$, standard error, and the p-value of the regression. You can do this for the full plot, or for any region that looks particularly nice ( remember, this is real data... ).
Command : gradient, intercept, r2, pval, stderr = stats.linregress(indices, time)

Regularisation of the Sampling

Irregular sampling is not fun. The series is fairly well-behaved, so we can interpolate over it fairly comfortably.
Use scipy.interpolate.interp1d(x, y) to create an interpolant. You can then pass an array of times that you want the interpolated values at. Create a variable regular_coffees that is the interpolation over the regularly-spaced time vector regular_time, and plot it on top of the normal t vs coffees to ensure that the interpolation was correct.

Daily Average Over the Week

Using your shiny new regularly-spaced coffee consumption time series, compute the average number of coffees devoured on each day of the week. Plot it with plt.errorbar and approximate 95% confidence intervals from the standard error of the mean using scipy.stats.sem.



In [4]:

    
# Basic numbercrunching
import numpy as np

# Interpolation, FFTs
from scipy import interpolate, stats

# Plotting
import matplotlib.pyplot as plt
import seaborn
figsize(10, 8)

# Datetime manipulation
# Be warned : functions in classes in modules are BADLY named...
import datetime as dt

# Data stuff I don't know how to use to its full potential, but read_csv() is nice
import pandas as pd

# Pickle, for saving variables
import pickle



In [5]:

    
# Let's read the data using Pandas, and look at what it contains.
data = pd.read_csv("CoffeeTimeSeries.csv")
data.head(20)









    Out[5]:






  
    
      
      Timestamp
      Coffees
      Name
      Day_of_week
      Time_Seconds
      Hour_of_day
    
  
  
    
      0 
       03/10/2011 08:22
       397
            Q
       1
       1317644520
        8
    
    
      1 
       04/10/2011 11:48
       410
            Q
       2
       1317743280
       11
    
    
      2 
       05/10/2011 07:02
       439
          Ant
       3
       1317812520
        7
    
    
      3 
       05/10/2011 08:25
       444
            Q
       3
       1317817500
        8
    
    
      4 
       05/10/2011 10:47
       464
            Q
       3
       1317826020
       10
    
    
      5 
       05/10/2011 13:15
       481
            Q
       3
       1317834900
       13
    
    
      6 
       06/10/2011 07:21
       503
          Ant
       4
       1317900060
        7
    
    
      7 
       06/10/2011 10:04
       513
            Q
       4
       1317909840
       10
    
    
      8 
       06/10/2011 12:14
       539
         Mike
       4
       1317917640
       12
    
    
      9 
       06/10/2011 12:49
       540
            Q
       4
       1317919740
       12
    
    
      10
       06/10/2011 14:52
       563
          Ben
       4
       1317927120
       14
    
    
      11
       07/10/2011 07:34
       581
          Ant
       5
       1317987240
        7
    
    
      12
       07/10/2011 08:37
       587
            Q
       5
       1317991020
        8
    
    
      13
       07/10/2011 11:09
       605
            Q
       5
       1318000140
       11
    
    
      14
       07/10/2011 13:14
       616
         Mike
       5
       1318007640
       13
    
    
      15
       07/10/2011 14:10
       620
          Ben
       5
       1318011000
       14
    
    
      16
       07/10/2011 15:20
       626
       Mike M
       5
       1318015200
       15
    
    
      17
       07/10/2011 16:50
       635
       Mike M
       5
       1318020600
       16
    
    
      18
       09/10/2011 16:53
       650
         Colm
       0
       1318193580
       16
    
    
      19
       10/10/2011 07:29
       656
          Ant
       1
       1318246140
        7
    
  

20 rows × 6 columns



In [6]:

    
# Let's just plot the time-series first. 
# The easiest way to plot against the date variable is to go via
# a datetime object, which can later be formatted as we want it

# strptime creates a datetime object from a string :
times = [dt.datetime.strptime(i, "%d/%m/%Y %H:%M") for i in data["Timestamp"].values]

# We now have a list of datetime objects, which we can format as we like.
# Let's create a few items : 
# a list of time values for plotting ( t )
# a list of strings of the day of the week ( weekday )
# a list of hours during the day ( dayhours )
# http://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior

t = np.array([int(i.strftime("%s")) for i in times]) / 86400. # in days
weekday = [int(i.strftime("%w")) for i in times] # weekday as number
weekdaylabels = np.unique([i.strftime("%A") for i in times]) # string of weekday
dayhours = [int(i.strftime("%H")) for i in times]

print t









    



[ 15250.51527778  15251.65833333  15252.45972222  15252.51736111
  15252.61597222  15252.71875     15253.47291667  15253.58611111
  15253.67638889  15253.70069444  15253.78611111  15254.48194444
  15254.52569444  15254.63125     15254.71805556  15254.75694444
  15254.80555556  15254.86805556  15256.87013889  15257.47847222
  15257.59236111  15257.73680556  15257.75138889  15257.80763889
  15258.75625     15259.50763889  15259.58125     15259.5875
  15259.66736111  15259.6875      15259.74722222  15260.54861111
  15260.61944444  15260.62777778  15260.77847222  15260.79583333
  15260.82291667  15260.83194444  15261.54305556  15261.58472222
  15261.75486111  15264.55833333  15264.63819444  15264.75347222
  15265.55555556  15265.56875     15265.64722222  15265.70416667
  15266.55208333  15266.56319444  15266.75555556  15266.79930556
  15267.70902778  15267.83263889  15268.61041667  15268.71875
  15268.79791667  15271.57708333  15271.59375     15271.59791667
  15271.77222222  15272.49583333  15272.56736111  15272.77083333
  15272.91111111  15273.55208333  15273.57083333  15273.66597222
  15273.73472222  15274.50208333  15274.63472222  15274.74375
  15275.62013889  15275.73125     15278.62361111  15278.625       15278.74861111
  15278.77430556  15279.51458333  15279.56458333  15279.59236111
  15279.79652778  15279.80486111  15280.62222222  15281.57777778
  15282.61180556  15282.72361111  15286.77222222  15286.83888889
  15287.71458333  15287.78958333  15288.64305556  15288.85833333
  15288.86944444  15289.64375     15289.67569444  15289.72152778
  15289.77013889  15292.66180556  15293.01458333  15293.64722222
  15293.83611111  15293.875       15294.6         15294.62361111
  15294.74791667  15295.68333333  15295.79097222  15296.67708333
  15296.87847222  15298.80555556  15299.59791667  15299.62916667
  15299.69027778  15299.69444444  15299.79652778  15301.60625
  15301.61458333  15301.72986111  15302.68541667  15302.77291667
  15302.80069444  15302.83333333  15306.61180556  15308.76388889  15309.675
  15311.91805556  15313.82847222  15314.65        15314.68680556
  15314.78958333  15315.65069444  15315.68958333  15315.83472222
  15316.78888889  15317.64722222  15317.86597222  15317.93680556
  15320.60416667  15320.77916667  15321.63055556  15321.77222222  15321.775
  15321.83958333  15327.71458333  15350.82083333  15350.92569444
  15352.64652778  15352.83263889  15355.62361111  15355.70694444
  15355.78541667  15355.85277778  15357.67638889  15357.81111111
  15357.95972222  15359.84097222  15360.78333333  15362.70208333
  15362.76041667  15363.64513889  15364.61388889  15364.62708333
  15365.6375      15365.75138889  15366.64791667  15366.66736111
  15366.80763889  15367.77430556  15369.58472222  15369.62847222
  15369.77083333  15371.59791667  15371.72986111  15371.77083333
  15371.88958333  15372.63194444  15372.775       15373.725       15373.85972222
  15374.7         15374.82986111  15375.77777778  15375.85625     15376.53125
  15376.66319444  15376.69027778  15376.7375      15376.95138889
  15378.67847222  15380.53055556  15380.74444444  15380.89722222
  15381.73263889  15382.78125     15383.89652778  15387.77638889
  15389.85347222  15390.71736111  15390.925       15391.57569444
  15392.62083333  15392.76597222  15392.87847222  15392.88055556
  15393.87291667  15394.78055556  15395.90208333  15397.57083333
  15397.85763889  15398.67569444  15399.60208333  15399.74791667
  15400.62986111  15400.80833333  15401.84236111  15403.83819444
  15405.67569444  15406.74305556  15407.74930556  15408.73958333
  15411.59513889  15411.67152778  15411.89444444  15434.575       15434.63333333
  15434.73125     15434.875       15434.87638889  15435.61041667
  15435.66597222  15440.57708333  15440.61458333  15440.74652778
  15441.65069444  15456.86458333  15456.91597222  15457.62708333
  15457.71944444  15460.54027778  15460.71875     15460.7875
  15461.75763889  15462.66666667  15462.72569444  15463.67638889
  15463.71736111  15464.63055556  15467.59513889  15467.70555556
  15467.72986111  15467.93055556  15468.64861111  15468.65763889
  15469.70416667  15469.75694444  15469.87152778  15470.675       15470.82222222
  15470.91597222  15471.81944444  15472.57638889  15474.58472222
  15474.68263889  15474.81041667  15475.54652778  15475.55486111
  15475.7125      15475.7375      15476.55347222  15476.60069444  15476.725
  15476.83680556  15476.87291667  15477.55694444  15477.70972222
  15477.76736111  15477.95        15478.55486111  15478.70625
  15478.71805556  15480.75        15481.55208333  15481.77777778
  15482.58888889  15482.72430556  15484.55763889  15484.72430556
  15484.82708333  15485.55486111  15485.63194444  15485.66319444
  15485.69027778  15486.81944444  15488.55277778  15488.70625     15488.975
  15489.62986111  15489.72847222  15491.54791667  15491.69513889
  15491.89166667  15492.55555556  15492.72708333  15492.95138889
  15495.52916667  15495.68333333  15495.72569444  15495.79791667
  15496.81180556  15496.87222222  15497.55416667  15497.59722222
  15497.73125     15497.82083333  15498.53055556  15498.5625
  15498.73333333  15498.89722222  15498.96597222  15499.57847222
  15499.58611111  15499.66319444  15501.84027778  15502.55069444
  15502.69513889  15502.72569444  15502.79097222  15502.83402778
  15502.96388889  15503.51041667  15506.69444444  15506.69652778
  15508.73819444  15508.90555556  15509.55        15509.70972222
  15509.91319444  15510.64444444  15511.70069444  15517.6375
  15517.79513889  15517.98263889  15518.67083333  15519.65694444
  15519.90416667  15520.55347222  15520.86597222  15523.675       15523.88125
  15524.67847222  15524.83958333  15525.58194444  15525.70069444
  15525.7375      15526.55277778  15526.67222222  15526.84166667  15527.725
  15527.91875     15528.88333333  15529.66875     15530.55555556
  15530.67708333  15530.76388889  15531.55        15531.73333333
  15532.54930556  15533.55625     15534.55486111  15534.74930556
  15537.70347222  15537.87291667  15538.55277778  15541.55555556
  15546.55277778  15546.61041667  15546.62083333  15546.77222222
  15547.58194444  15547.65208333  15547.73472222  15548.55277778
  15548.63611111  15550.00555556  15551.63541667  15551.73263889
  15551.84027778  15552.72638889  15553.71736111  15553.76597222
  15554.55277778  15554.72083333  15554.82361111  15555.77430556
  15557.85763889  15558.63055556  15558.63888889  15558.72083333
  15558.73402778  15559.60972222  15559.70972222  15560.56111111
  15560.68472222  15560.72152778  15560.73194444  15560.82569444
  15561.61388889  15561.65277778  15561.71527778  15561.72569444
  15561.79861111  15565.56666667  15565.74027778  15566.55763889
  15566.61527778  15566.76597222  15566.96180556  15567.56041667
  15567.76111111  15568.55277778  15568.82083333  15569.55972222
  15569.71875     15572.56111111  15572.75694444  15573.5875
  15573.70902778  15573.83680556  15574.58125     15574.74444444
  15575.55347222  15575.70833333  15575.71944444  15576.81388889
  15577.79930556  15579.57986111  15579.74513889  15580.61666667
  15581.73194444  15582.56458333  15582.73472222  15582.9         15583.55833333
  15583.63958333  15583.74513889  15583.80555556  15584.75902778
  15584.90486111  15586.67986111  15586.84375     15587.65486111
  15587.78611111  15588.56111111  15588.75416667  15588.78333333
  15589.71875     15590.55555556  15590.57569444  15590.73611111
  15590.87013889  15593.5625      15593.72916667  15593.88958333
  15594.71597222  15595.87847222  15596.62986111  15596.71875
  15596.78541667  15597.48472222  15597.64930556  15598.72777778
  15598.95347222  15600.56736111  15600.71458333  15601.59861111
  15602.78333333  15603.57291667  15603.71111111  15604.59375
  15604.66041667  15604.70555556  15607.71736111  15607.73541667
  15607.80625     15608.63194444  15608.81805556  15609.56111111
  15609.69166667  15609.83472222  15610.58333333  15610.65972222
  15610.73055556  15611.55694444  15611.59722222  15611.72013889
  15612.54513889  15612.80277778  15614.55763889  15614.58194444
  15614.67430556  15614.71388889  15616.87916667  15617.51388889
  15617.61111111  15617.69652778  15617.77291667  15618.51527778
  15618.69930556  15618.70972222  15618.79861111  15618.90277778
  15621.72847222  15621.74166667  15621.79375     15622.57222222
  15622.63611111  15622.68402778  15623.58541667  15623.62708333
  15623.66180556  15623.71527778  15623.73958333  15623.82638889
  15624.59583333  15624.70694444  15625.57430556  15625.61805556
  15625.73194444  15626.69236111  15626.81388889  15627.86388889
  15628.53055556  15628.60763889  15628.725       15628.78958333
  15629.69236111  15630.57638889  15630.60069444  15630.70833333
  15630.70902778  15630.74305556  15631.57222222  15631.58611111
  15631.65277778  15631.73680556  15632.59722222  15632.84583333
  15633.76388889  15633.85763889  15633.85833333  15635.61527778
  15635.67222222  15635.72777778  15635.86041667  15636.69652778
  15638.57986111  15639.83541667  15643.56666667  15644.70208333
  15644.75763889  15644.91805556  15645.57291667  15645.63819444
  15645.80555556  15646.65138889  15646.74027778  15649.67847222
  15649.76736111  15650.60416667  15651.62013889  15651.66597222
  15651.73958333  15652.69305556  15652.78472222  15653.62986111
  15653.64305556  15653.71597222  15653.74166667  15653.75972222
  15656.60694444  15656.60833333  15656.76458333  15658.74305556
  15659.81805556  15660.64027778  15660.78125     15663.68541667
  15663.7625      15663.89375     15664.65902778  15664.71875
  15664.78055556  15665.62361111  15665.89375     15667.69305556
  15667.73958333  15667.76527778  15667.78611111  15668.69722222
  15668.74236111  15670.625       15670.65972222  15670.77361111
  15672.62361111  15672.74861111  15674.75763889  15674.87986111
  15677.68125     15677.91666667  15678.65763889  15678.75208333
  15678.79722222  15679.67569444  15680.81666667  15681.68194444
  15682.72847222  15682.80555556  15682.92708333  15684.71666667
  15685.64513889  15685.80763889  15686.61319444  15686.75486111
  15686.84513889  15688.64166667  15691.81666667  15692.66944444
  15692.78472222  15695.84930556  15713.79027778  15714.85069444
  15715.67013889  15716.66458333  15716.775       15716.92083333
  15720.55902778  15721.73472222  15721.96458333  15722.78125
  15726.79652778  15733.65486111  15733.75902778  15733.79236111
  15734.77986111  15734.8375      15740.76736111  15742.93958333
  15743.77083333  15748.56666667  15748.69375     15749.79027778
  15752.70486111  15754.71111111  15754.78194444  15757.78055556
  15757.83472222  15761.77291667  15761.93402778  15763.60625
  15768.65694444  15768.675       15768.90486111  15769.65416667
  15769.77013889  15772.64444444  15776.51944444  15800.63888889
  15822.62777778  15960.81805556  15961.60277778]



In [7]:

    
# OK, we've extracted some useful time data. Let's grab the number of coffees and plot stuff.
coffees = data["Coffees"].values

plt.subplot2grid((2, 2), (0, 0), colspan=2)
plt.plot(t, coffees, linewidth=3)
plt.title("Coffee Consumption Time-Series")
plt.xlabel("Time (datetime seconds)")
plt.ylabel("Number of coffees")

plt.subplot(2, 2, 3)
plt.hist(weekday, 7)
plt.title("Consumption Per Day")
plt.xticks(range(7), ["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"]);
plt.ylabel("Frequency")

plt.subplot(2, 2, 4)
plt.hist(dayhours, np.unique(dayhours))
plt.title("Hours of Consumption During the Day")
plt.xlabel("Hour of Day")
plt.ylabel("Frequency")

plt.tight_layout()



In [8]:

    
# So, fairly uniform in terms of quantity consumed per day ( Monday blues ? ).
# Hourly is more interesting : it drops during the day, with an after-lunch spike.
# We can also infer a little about what time people arrive at work !

# Let's go back to the time-series itself. Was the machine broken at any point ?
# Consider over ten days a critical emergency. 

plt.subplot(121)
plt.plot(np.diff(t), linewidth=3)
plt.axhline(10, color = seaborn.color_palette("deep", 3)[2], linewidth=3)
plt.title("Days between coffees")
plt.xlabel("Time (days)")
plt.ylabel("Days between coffees")

plt.subplot(122)
plt.hist(np.diff(t), np.arange(np.diff(t).max()))
plt.axvline(10, color = seaborn.color_palette("deep", 3)[2], linewidth=3)
plt.title("Distribution of Number of Days Between Coffees")
plt.xlabel("Days between coffees")
plt.ylabel("Frequency")

plt.tight_layout()



In [9]:

    
# Yes, it's been broken ! We note five occasions :
# five continguous measurement of over ten days between coffees.
# The last spike requires a little context. It's got a spike with three, not one, value
# above the 10-mark. This was because data had more or less stopped being collected
# ( you can see a few days between measurements after the last coffee machine breakdown
# and prior to this big spike ).

# So, if, until that point, measurements were *fairly* regular, we can ask who took them.
names = data["Name"].values
contributors = np.unique(names)
contributions = [np.sum(np.array(names) == i) for i in contributors]

plt.figure(figsize=(12,12))
plt.pie(contributions, labels=contributors, colors=seaborn.color_palette("deep", len(contributors)));



In [10]:

    
# So mostly me, with great contributions from Mike Irvine and Sergio Morales - thanks guys !

# Let's take a step back and consider the sampling of the time series. We saw earlier how
# the time between measurements varied a great deal. Let's look at raw sampling times first.
plt.plot(t, linewidth=3)
plt.plot(np.arange(t[0], t[-1]), linewidth=3)
plt.title("Sampling Times")
plt.xlabel("Index")
plt.ylabel("Sampling time")
plt.legend(["Measurements taken", "Linear time"], loc=2)









    Out[10]:





<matplotlib.legend.Legend at 0x115306fd0>



In [11]:

    
# In general, the blue line is under the green line, so we sampled more than once a day. 
# Let's calculate the average sampling frequency by finding the gradient of the line of best fit.
# We'll use the long, middle section, between 250 and 650
gradient, intercept, r2, pval, stderr = stats.linregress(range(250, 600), t[250:600])

plt.plot(t, linewidth=3)
plt.plot(np.arange(700) * gradient + intercept, linewidth=3)
plt.title("Approximate Rate of Sampling : %f per day; $R^2$ = %f, Standard Error = %f" % (1/gradient, r2, stderr))
plt.xlabel("Index")
plt.ylabel("Time at measurement (days)")









    Out[11]:





<matplotlib.text.Text at 0x114784b10>



In [12]:

    
# The sampling is not very regular. Let's interpolate to fix that.
# We'll use a standard linear interpolator.

# First, create a list of floats, starting at the first sample, and 
# ending at or just before the last sample, incrementing in one day intervals
regular_time = np.arange(t[0], t[-1])

# Now let's interpolate the series
coffee_interpolator = interpolate.interp1d(t, coffees)
regular_coffees = coffee_interpolator(regular_time)

# Plot both time series against their respective times to confirm
# and only show every tenth interpolated point
plt.plot(t, coffees)
plt.plot(regular_time[::10], regular_coffees[::10], 'o')
plt.title("Interpolation of the Time-Series")
plt.xlabel("Time (days)")
plt.ylabel("Number of coffees devoured")









    Out[12]:





<matplotlib.text.Text at 0x1146c1990>



In [13]:

    
# Now that we have a uniformly-sampled time series, we can start to look at regularities during the week.
# Let's compute the average number of coffees for each day of the week. First, let's look at the series itself.
plt.plot(np.diff(regular_coffees))









    Out[13]:





[<matplotlib.lines.Line2D at 0x115303b50>]



In [14]:

    
# So we see a few issues. Firstly, the constants are due to interpolating over missing data. 
# We could be clever about it, by reinterpolating after inserting a zero-change point just after long breaks,
# but in the interest of time, we'll just skip them for now. Let's start at 210, and finish at 420.
# The big dip is due to people being away. The time series started in October, which puts the dip
# squarely in the summer.

# So, let's compute that average.

# Declare empty arrays
weekly_average = np.zeros(7)
weekly_stderr = np.zeros(7)

# Iterating over each day of the week
for i in range(7) :
    weekly_average[i] = np.mean(np.diff(regular_coffees[210:420])[i::7])
    weekly_stderr[i] = stats.sem(np.diff(regular_coffees[210:420])[i::7])
    
# Plot with approximate 95% CI
plt.errorbar(range(7), weekly_average, yerr=weekly_stderr * 1.96, linewidth=3)









    Out[14]:





<Container object of 3 artists>



In [15]:

    
# If you got this far, well done.
# You probably work as hard as Warwick Complexity, whose caffeine consumption doesn't drop to zero over the weekends...

	Timestamp	Coffees	Name	Day_of_week	Time_Seconds	Hour_of_day
0	03/10/2011 08:22	397	Q	1	1317644520	8
1	04/10/2011 11:48	410	Q	2	1317743280	11
2	05/10/2011 07:02	439	Ant	3	1317812520	7
3	05/10/2011 08:25	444	Q	3	1317817500	8
4	05/10/2011 10:47	464	Q	3	1317826020	10
5	05/10/2011 13:15	481	Q	3	1317834900	13
6	06/10/2011 07:21	503	Ant	4	1317900060	7
7	06/10/2011 10:04	513	Q	4	1317909840	10
8	06/10/2011 12:14	539	Mike	4	1317917640	12
9	06/10/2011 12:49	540	Q	4	1317919740	12
10	06/10/2011 14:52	563	Ben	4	1317927120	14
11	07/10/2011 07:34	581	Ant	5	1317987240	7
12	07/10/2011 08:37	587	Q	5	1317991020	8
13	07/10/2011 11:09	605	Q	5	1318000140	11
14	07/10/2011 13:14	616	Mike	5	1318007640	13
15	07/10/2011 14:10	620	Ben	5	1318011000	14
16	07/10/2011 15:20	626	Mike M	5	1318015200	15
17	07/10/2011 16:50	635	Mike M	5	1318020600	16
18	09/10/2011 16:53	650	Colm	0	1318193580	16
19	10/10/2011 07:29	656	Ant	1	1318246140	7

Coffee Consumption - Exercises

Quentin CAUDRON Ecology and Evolutionary Biology qcaudron@princeton.edu @QuentinCAUDRON

Exercises

Quentin CAUDRON
Ecology and Evolutionary Biology
qcaudron@princeton.edu
@QuentinCAUDRON