San Diego Burrito Analytics: Bootcamp 2016

15 Sept 2016

This notebook characterizes the data collected from consuming burritos from Don Carlos during Neuro bootcamp.

Outline

Load data into python
- Use a Pandas dataframe
- View data
- Print some metadata
Hypothesis tests
- California burritos vs. Carnitas burritos
- Don Carlos 1 vs. Don Carlos 2
- Bonferroni correction
Distributions
- Distributions of each burrito quality
- Tests for normal distribution
Correlations
- Hunger vs. Overall rating
- Correlation matrix
Assumptions discussion

0. Import libraries into Python



In [1]:

    
# These commands control inline plotting
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

import numpy as np # Useful numeric package
import scipy as sp # Useful statistics package
import matplotlib.pyplot as plt # Plotting package

1. Load data into a Pandas dataframe



In [58]:

    
import pandas as pd # Dataframe package
filename = './burrito_bootcamp.csv'
df = pd.read_csv(filename)

View raw data



In [59]:

    
df









    Out[59]:






  
    
      
      Location
      Burrito
      Hunger
      Length
      Circum
      Volume
      Tortilla
      Temp
      Meat
      Fillings
      Meat:filling
      Uniformity
      Salsa
      Synergy
      Wrap
      overall
      Rec
      Reviewer
    
  
  
    
      0
      Don Carlos Taco Shop
      Shredded chicken
      3.0
      23.5
      21.50
      0.86
      3.0
      5.0
      3.00
      3.5
      4.0
      4.0
      4.0
      4.0
      4.0
      3.80
      Yes
      Scott
    
    
      1
      Don Carlos Taco Shop
      Carne asada
      3.5
      22.5
      22.00
      0.87
      2.0
      3.5
      2.50
      2.5
      2.0
      4.0
      3.5
      2.5
      5.0
      3.00
      Yes
      Scott
    
    
      2
      Don Carlos Taco Shop
      Soyrizo
      1.5
      22.5
      22.00
      0.87
      3.0
      2.0
      2.50
      3.0
      4.5
      4.0
      3.0
      3.0
      5.0
      3.00
      Yes
      Emily
    
    
      3
      Don Carlos Taco Shop
      Soyrizo
      2.0
      23.0
      22.50
      0.93
      3.0
      2.0
      3.50
      3.0
      4.0
      5.0
      4.0
      4.0
      5.0
      3.75
      Yes
      Ricardo
    
    
      4
      Don Carlos Taco Shop
      Soyrizo
      4.0
      NaN
      NaN
      NaN
      4.0
      5.0
      4.00
      3.5
      4.5
      5.0
      2.5
      4.5
      4.0
      4.20
      Yes
      Scott
    
    
      5
      Don Carlos Taco Shop
      Soyrizo
      4.0
      21.5
      20.00
      0.68
      3.0
      4.0
      5.00
      3.5
      2.5
      2.5
      2.5
      4.0
      1.0
      3.20
      Yes
      Emily
    
    
      6
      Don Carlos Taco Shop
      Soyrizo
      1.5
      23.0
      23.00
      0.97
      2.0
      3.0
      3.00
      2.0
      2.5
      2.5
      NaN
      2.0
      3.0
      2.60
      Yes
      Scott
    
    
      7
      Don Carlos Taco Shop
      California
      4.0
      21.5
      20.50
      0.72
      2.5
      3.0
      3.00
      2.5
      3.0
      3.5
      NaN
      2.5
      3.0
      3.00
      Yes
      Emily
    
    
      8
      Don Carlos Taco Shop
      California
      3.5
      23.0
      21.50
      0.85
      2.0
      4.5
      4.50
      3.5
      1.5
      3.0
      3.5
      4.0
      2.0
      3.90
      Yes
      Scott
    
    
      9
      Don Carlos Taco Shop
      California
      3.5
      22.0
      20.80
      0.76
      2.5
      1.5
      1.50
      3.0
      4.5
      3.0
      1.5
      2.0
      4.5
      2.00
      Yes
      Scott
    
    
      10
      Don Carlos Taco Shop
      California
      2.0
      22.5
      21.50
      0.83
      2.5
      2.5
      2.75
      2.5
      2.5
      2.0
      0.5
      3.0
      3.5
      2.50
      Yes
      Emily
    
    
      11
      Don Carlos Taco Shop
      California
      2.0
      23.0
      21.50
      0.85
      3.0
      4.0
      4.00
      3.0
      4.0
      4.0
      1.0
      2.0
      1.0
      3.00
      Yes
      Marc
    
    
      12
      Don Carlos Taco Shop
      California
      3.5
      21.0
      22.50
      0.85
      3.0
      3.5
      3.50
      4.0
      2.0
      3.5
      1.0
      4.0
      4.0
      3.90
      Yes
      Scott
    
    
      13
      Don Carlos Taco Shop
      California
      3.0
      22.5
      21.00
      0.79
      3.0
      1.0
      1.50
      2.5
      4.0
      4.0
      3.0
      4.5
      5.0
      2.00
      Yes
      Nicole
    
    
      14
      Don Carlos Taco Shop
      California
      3.0
      NaN
      NaN
      NaN
      4.0
      NaN
      2.00
      2.0
      4.0
      4.0
      NaN
      3.0
      4.0
      2.75
      Yes
      Cris
    
    
      15
      Don Carlos Taco Shop
      California
      4.0
      22.0
      20.00
      0.70
      3.0
      2.5
      4.00
      4.0
      3.5
      2.5
      3.5
      5.0
      4.5
      4.20
      Yes
      Emily
    
    
      16
      Don Carlos Taco Shop
      California
      2.5
      20.5
      21.75
      0.77
      4.0
      4.0
      4.50
      4.0
      5.0
      4.5
      3.5
      4.0
      2.0
      4.10
      Yes
      Scott
    
    
      17
      Don Carlos Taco Shop
      California
      3.0
      20.5
      20.00
      0.65
      4.0
      4.0
      3.00
      3.5
      4.0
      4.5
      4.0
      4.0
      4.5
      4.00
      No
      Scott
    
    
      18
      Don Carlos Taco Shop
      California
      2.0
      18.5
      20.50
      0.62
      3.5
      4.0
      3.50
      NaN
      4.0
      NaN
      4.0
      4.0
      1.5
      4.00
      No
      Emily
    
    
      19
      Don Carlos Taco Shop
      California
      4.0
      21.5
      20.00
      0.68
      3.0
      4.0
      2.75
      3.0
      4.0
      2.0
      2.0
      NaN
      5.0
      3.00
      No
      Leo
    
    
      20
      Don Carlos Taco Shop
      California
      2.5
      20.0
      20.00
      0.64
      3.5
      3.0
      3.00
      3.0
      4.0
      4.0
      1.5
      NaN
      4.5
      3.50
      No
      Scott
    
    
      21
      Don Carlos Taco Shop
      Carnitas
      3.5
      19.0
      21.00
      0.67
      1.5
      2.0
      3.00
      3.5
      4.0
      1.0
      3.5
      4.5
      4.0
      4.00
      No
      Scott
    
    
      22
      Don Carlos Taco Shop
      Carnitas
      2.5
      21.5
      22.50
      0.87
      1.5
      2.5
      3.50
      3.0
      4.0
      1.5
      2.5
      3.5
      4.5
      3.50
      No
      Emily
    
    
      23
      Don Carlos Taco Shop
      Carnitas
      2.5
      18.5
      21.00
      0.65
      4.0
      3.0
      4.00
      4.0
      4.0
      4.0
      4.0
      4.0
      3.0
      4.60
      No
      Scott
    
    
      24
      Don Carlos Taco Shop
      Carnitas
      2.5
      23.0
      19.00
      0.66
      3.0
      2.0
      4.50
      4.0
      4.0
      3.5
      3.0
      4.5
      4.5
      4.50
      No
      Emily
    
    
      25
      Don Carlos Taco Shop
      Carnitas
      3.5
      22.5
      20.50
      0.75
      2.5
      2.5
      3.00
      4.0
      4.0
      4.0
      3.0
      3.5
      1.5
      3.80
      No
      Emily
    
    
      26
      Don Carlos Taco Shop
      Carnitas
      3.5
      18.5
      21.50
      0.68
      2.5
      3.0
      3.00
      4.0
      2.0
      2.0
      3.0
      3.5
      3.5
      3.00
      No
      Scott
    
    
      27
      Don Carlos Taco Shop
      Carnitas
      3.5
      16.5
      23.50
      0.73
      3.5
      5.0
      4.00
      4.0
      3.5
      4.5
      3.5
      3.5
      4.0
      4.00
      No
      Scott
    
    
      28
      Don Carlos Taco Shop
      Carnitas
      4.5
      20.5
      20.50
      0.69
      3.0
      5.0
      4.00
      4.0
      5.0
      4.0
      2.5
      4.5
      4.0
      4.00
      No
      Sage

Brief metadata



In [60]:

    
print 'Number of burritos:', df.shape[0]
print 'Average burrito rating'
print 'Reviewers: '
print np.array(df['Reviewer'])









    



Number of burritos: 29
Average burrito rating
Reviewers: 
['Scott' 'Scott' 'Emily' 'Ricardo' 'Scott' 'Emily' 'Scott' 'Emily' 'Scott'
 'Scott' 'Emily' 'Marc' 'Scott' 'Nicole' 'Cris' 'Emily' 'Scott' 'Scott'
 'Emily' 'Leo' 'Scott' 'Scott' 'Emily' 'Scott' 'Emily' 'Emily' 'Scott'
 'Scott' 'Sage']

What types of burritos have been rated?



In [61]:

    
def burritotypes(x, types = {'California':'cali', 'Carnitas':'carnita', 'Carne asada':'carne asada',
                             'Soyrizo':'soyrizo', 'Shredded chicken':'chicken'}):
    import re
    T = len(types)
    Nmatches = {}
    for b in x:
        matched = False
        for t in types.keys():
            re4str = re.compile('.*'+types[t]+'.*', re.IGNORECASE)
            if np.logical_and(re4str.match(b) is not None, matched is False):
                try:
                    Nmatches[t] +=1
                except KeyError:
                    Nmatches[t] = 1
                matched = True
        if matched is False:
            try:
                Nmatches['other'] +=1
            except KeyError:
                Nmatches['other'] = 1
    return Nmatches

typecounts = burritotypes(df.Burrito)



In [62]:

    
plt.figure(figsize=(6,6))
ax = plt.axes([0.1, 0.1, 0.65, 0.65])

# The slices will be ordered and plotted counter-clockwise.
labels = typecounts.keys()
fracs = typecounts.values()
explode=[.1]*len(typecounts)

patches, texts, autotexts = plt.pie(fracs, explode=explode, labels=labels,
                autopct=lambda(p): '{:.0f}'.format(p * np.sum(fracs) / 100), shadow=False, startangle=0)
                # The default startangle is 0, which would start
                # the Frogs slice on the x-axis.  With startangle=90,
                # everything is rotated counter-clockwise by 90 degrees,
                # so the plotting starts on the positive y-axis.

plt.title('Types of burritos',size=30)
for t in texts:
    t.set_size(20)
for t in autotexts:
    t.set_size(20)
autotexts[0].set_color('w')

2. Hypothesis tests

California burritos vs. Carnitas burritos



In [66]:

    
dfCali = df[df.Burrito=='California']
dfCarnitas = df[df.Burrito=='Carnitas']
print sp.stats.ttest_ind(dfCali.overall, dfCarnitas.overall)









    



Ttest_indResult(statistic=-2.1136889157899783, pvalue=0.0473027579110386)

Don Carlos 1 vs. Don Carlos 2



In [67]:

    
dfdc1 = df[df.Location=='Don Carlos 1']
dfdc2 = df[df.Location=='Don Carlos 2']
print sp.stats.ttest_ind(dfdc1.overall, dfdc2.overall)









    



Ttest_indResult(statistic=nan, pvalue=nan)

Bonferroni correction



In [77]:

    
# Compare every dimension for California vs. Carnitas
dims_Cali_v_Carni = df.keys()[5:15]
Ndim = len(dims_Cali_v_Carni)

print 'Measure     p-value'
for d in dims_Cali_v_Carni:
    print d, sp.stats.ttest_ind(dfCali[d].dropna(), dfCarnitas[d].dropna())[1]
print 'Minimum p-value needed for significance:', .05/np.float(Ndim)









    



Measure     p-value
Volume 0.351520783633
Tortilla 0.207320986195
Temp 0.895689759545
Meat 0.190059998937
Fillings 0.0127291155058
Meat:filling 0.56627784555
Uniformity 0.461350486955
Salsa 0.161083098233
Synergy 0.259866239715
Wrap 0.822730604223
Minimum p-value needed for significance: 0.005

3. Burrito dimension distributions

Distribution of each burrito quality



In [44]:

    
import math
def metrichist(metricname):
    if metricname == 'Volume':
        bins = np.arange(.375,1.225,.05)
        xticks = np.arange(.4,1.2,.1)
        xlim = (.4,1.2)
    elif metricname == 'Length':
        bins = np.arange(10,30,1)
        xticks = np.arange(10,30,5)
        xlim = (10,30)
    elif metricname == 'Circum':
        bins = np.arange(10,30,1)
        xticks = np.arange(10,30,5)
        xlim = (10,30)
    else:
        bins = np.arange(-.25,5.5,.5)
        xticks = np.arange(0,5.5,.5)
        xlim = (-.25,5.25)
        
    plt.figure(figsize=(5,5))
    n, _, _ = plt.hist(df[metricname].dropna(),bins,color='k')
    plt.xlabel(metricname + ' rating',size=20)
    plt.xticks(xticks,size=15)
    plt.xlim(xlim)
    plt.ylabel('Count',size=20)
    plt.yticks((0,int(math.ceil(np.max(n) / 5.)) * 5),size=15)
    plt.tight_layout()



In [45]:

    
for m in df.keys()[2:-2]:
    metrichist(m)

Test for normal distribution

Note that even though the distributions don't look very normally distributed, most of them do not reach the p=.05 significance level in order to be rejected as normally distributed. However, a normality assumption may not be reasonable for analyzing this data, and therefore we should use nonparametric tests and measures (like Mann Whitney U or Spearman correlation) as opposed to parametric ones that assume a normal distribution (T-test) or assume a linear relationship (like Pearson correlations)



In [49]:

    
for m in df.keys()[2:-2]:
    print m, '        Normal test p-value: ', sp.stats.mstats.normaltest(df[m].dropna())[1]









    



Hunger         Normal test p-value:  0.313039555941
Length         Normal test p-value:  0.0638554296046
Circum         Normal test p-value:  0.857495443844
Volume         Normal test p-value:  0.161935269168
Tortilla         Normal test p-value:  0.83289520022
Temp         Normal test p-value:  0.491918985119
Meat         Normal test p-value:  0.855356099172
Fillings         Normal test p-value:  0.273594468121
Meat:filling         Normal test p-value:  0.177461492667
Uniformity         Normal test p-value:  0.302044946143
Salsa         Normal test p-value:  0.208542447935
Synergy         Normal test p-value:  0.315578066829
Wrap         Normal test p-value:  0.122514175376
overall         Normal test p-value:  0.398340957901

4. Correlations

Hunger vs. overall rating



In [31]:

    
plt.figure(figsize=(4,4))
ax = plt.gca()
df.plot(kind='scatter',x='Hunger',y='overall',ax=ax,**{'s':40,'color':'k'})
plt.xlabel('Hunger',size=20)
plt.ylabel('Overall rating',size=20)
plt.xticks(np.arange(0,6),size=15)
plt.yticks(np.arange(0,6),size=15)


from tools.misc import pearsonp
print 'Pearson correlation coefficient: ', sp.stats.pearsonr(df.Hunger,df.overall)[0]
print 'p-value: ', sp.stats.pearsonr(df.Hunger,df.overall)[1]









    



Pearson correlation coefficient:  0.13435001869
p-value:  0.487160255895

Correlation matrix



In [38]:

    
dfcorr = df.corr()



In [37]:

    
from matplotlib import cm
M = len(dfcorr)
clim1 = (-1,1)

plt.figure(figsize=(12,10))
cax = plt.pcolor(range(M+1), range(M+1), dfcorr, cmap=cm.bwr)
cbar = plt.colorbar(cax, ticks=(-1,-.5,0,.5,1))
cbar.ax.set_ylabel('Pearson correlation (r)', size=30)
plt.clim(clim1)
cbar.ax.set_yticklabels((-1,-.5,0,.5,1),size=20)
ax = plt.gca()
ax.set_yticks(np.arange(M)+.5)
ax.set_yticklabels(dfcorr.keys(),size=25)
ax.set_xticks(np.arange(M)+.5)
ax.set_xticklabels(dfcorr.keys(),size=25)
plt.xticks(rotation='vertical')
plt.tight_layout()
plt.xlim((0,M))
plt.ylim((0,M))









    Out[37]:





(0, 14)

Assumptions

Statistical tests often carry assumptions about how the data was collected and the distribution of values in the data. One very important assumption inherent in most statistical tests is independence of the samples. Naively, we could claim that each sample is independent because each person ate a physically unique burrito.

For example, consider our test for if the California burritos at Don Carlos were better than the Carnitas burritos. For robust independence of samples, we should be consuming each burrito at a random time. This is important in order to make a claim about how the burritos at Don Carlos fare, in general. This independence assumption is violated because all burritos were purchased at the same time. For example, perhaps the pork that Don Carlos bought that week was unusually unfresh. This could have spuriously decreased the ratings of the Carnitas burrito, and our conclusion would likely be False because the sample of pork was not representative of an average Don Carlos burrito.



In [ ]:

	Location	Burrito	Hunger	Length	Circum	Volume	Tortilla	Temp	Meat	Fillings	Meat:filling	Uniformity	Salsa	Synergy	Wrap	overall	Rec	Reviewer
0	Don Carlos Taco Shop	Shredded chicken	3.0	23.5	21.50	0.86	3.0	5.0	3.00	3.5	4.0	4.0	4.0	4.0	4.0	3.80	Yes	Scott
1	Don Carlos Taco Shop	Carne asada	3.5	22.5	22.00	0.87	2.0	3.5	2.50	2.5	2.0	4.0	3.5	2.5	5.0	3.00	Yes	Scott
2	Don Carlos Taco Shop	Soyrizo	1.5	22.5	22.00	0.87	3.0	2.0	2.50	3.0	4.5	4.0	3.0	3.0	5.0	3.00	Yes	Emily
3	Don Carlos Taco Shop	Soyrizo	2.0	23.0	22.50	0.93	3.0	2.0	3.50	3.0	4.0	5.0	4.0	4.0	5.0	3.75	Yes	Ricardo
4	Don Carlos Taco Shop	Soyrizo	4.0	NaN	NaN	NaN	4.0	5.0	4.00	3.5	4.5	5.0	2.5	4.5	4.0	4.20	Yes	Scott
5	Don Carlos Taco Shop	Soyrizo	4.0	21.5	20.00	0.68	3.0	4.0	5.00	3.5	2.5	2.5	2.5	4.0	1.0	3.20	Yes	Emily
6	Don Carlos Taco Shop	Soyrizo	1.5	23.0	23.00	0.97	2.0	3.0	3.00	2.0	2.5	2.5	NaN	2.0	3.0	2.60	Yes	Scott
7	Don Carlos Taco Shop	California	4.0	21.5	20.50	0.72	2.5	3.0	3.00	2.5	3.0	3.5	NaN	2.5	3.0	3.00	Yes	Emily
8	Don Carlos Taco Shop	California	3.5	23.0	21.50	0.85	2.0	4.5	4.50	3.5	1.5	3.0	3.5	4.0	2.0	3.90	Yes	Scott
9	Don Carlos Taco Shop	California	3.5	22.0	20.80	0.76	2.5	1.5	1.50	3.0	4.5	3.0	1.5	2.0	4.5	2.00	Yes	Scott
10	Don Carlos Taco Shop	California	2.0	22.5	21.50	0.83	2.5	2.5	2.75	2.5	2.5	2.0	0.5	3.0	3.5	2.50	Yes	Emily
11	Don Carlos Taco Shop	California	2.0	23.0	21.50	0.85	3.0	4.0	4.00	3.0	4.0	4.0	1.0	2.0	1.0	3.00	Yes	Marc
12	Don Carlos Taco Shop	California	3.5	21.0	22.50	0.85	3.0	3.5	3.50	4.0	2.0	3.5	1.0	4.0	4.0	3.90	Yes	Scott
13	Don Carlos Taco Shop	California	3.0	22.5	21.00	0.79	3.0	1.0	1.50	2.5	4.0	4.0	3.0	4.5	5.0	2.00	Yes	Nicole
14	Don Carlos Taco Shop	California	3.0	NaN	NaN	NaN	4.0	NaN	2.00	2.0	4.0	4.0	NaN	3.0	4.0	2.75	Yes	Cris
15	Don Carlos Taco Shop	California	4.0	22.0	20.00	0.70	3.0	2.5	4.00	4.0	3.5	2.5	3.5	5.0	4.5	4.20	Yes	Emily
16	Don Carlos Taco Shop	California	2.5	20.5	21.75	0.77	4.0	4.0	4.50	4.0	5.0	4.5	3.5	4.0	2.0	4.10	Yes	Scott
17	Don Carlos Taco Shop	California	3.0	20.5	20.00	0.65	4.0	4.0	3.00	3.5	4.0	4.5	4.0	4.0	4.5	4.00	No	Scott
18	Don Carlos Taco Shop	California	2.0	18.5	20.50	0.62	3.5	4.0	3.50	NaN	4.0	NaN	4.0	4.0	1.5	4.00	No	Emily
19	Don Carlos Taco Shop	California	4.0	21.5	20.00	0.68	3.0	4.0	2.75	3.0	4.0	2.0	2.0	NaN	5.0	3.00	No	Leo
20	Don Carlos Taco Shop	California	2.5	20.0	20.00	0.64	3.5	3.0	3.00	3.0	4.0	4.0	1.5	NaN	4.5	3.50	No	Scott
21	Don Carlos Taco Shop	Carnitas	3.5	19.0	21.00	0.67	1.5	2.0	3.00	3.5	4.0	1.0	3.5	4.5	4.0	4.00	No	Scott
22	Don Carlos Taco Shop	Carnitas	2.5	21.5	22.50	0.87	1.5	2.5	3.50	3.0	4.0	1.5	2.5	3.5	4.5	3.50	No	Emily
23	Don Carlos Taco Shop	Carnitas	2.5	18.5	21.00	0.65	4.0	3.0	4.00	4.0	4.0	4.0	4.0	4.0	3.0	4.60	No	Scott
24	Don Carlos Taco Shop	Carnitas	2.5	23.0	19.00	0.66	3.0	2.0	4.50	4.0	4.0	3.5	3.0	4.5	4.5	4.50	No	Emily
25	Don Carlos Taco Shop	Carnitas	3.5	22.5	20.50	0.75	2.5	2.5	3.00	4.0	4.0	4.0	3.0	3.5	1.5	3.80	No	Emily
26	Don Carlos Taco Shop	Carnitas	3.5	18.5	21.50	0.68	2.5	3.0	3.00	4.0	2.0	2.0	3.0	3.5	3.5	3.00	No	Scott
27	Don Carlos Taco Shop	Carnitas	3.5	16.5	23.50	0.73	3.5	5.0	4.00	4.0	3.5	4.5	3.5	3.5	4.0	4.00	No	Scott
28	Don Carlos Taco Shop	Carnitas	4.5	20.5	20.50	0.69	3.0	5.0	4.00	4.0	5.0	4.0	2.5	4.5	4.0	4.00	No	Sage