San Diego Burrito Analytics: Bootcamp 2016

Scott Cole

15 Sept 2016

This notebook characterizes the data collected from consuming burritos from Don Carlos during Neuro bootcamp.

Outline

  1. Load data into python
    • Use a Pandas dataframe
    • View data
    • Print some metadata
  2. Hypothesis tests
    • California burritos vs. Carnitas burritos
    • Don Carlos 1 vs. Don Carlos 2
    • Bonferroni correction
  3. Distributions
    • Distributions of each burrito quality
    • Tests for normal distribution
  4. Correlations
    • Hunger vs. Overall rating
    • Correlation matrix
  5. Assumptions discussion

0. Import libraries into Python


In [1]:
# These commands control inline plotting
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

import numpy as np # Useful numeric package
import scipy as sp # Useful statistics package
import matplotlib.pyplot as plt # Plotting package

1. Load data into a Pandas dataframe


In [25]:
import pandas as pd # Dataframe package
filename = './burrito_bootcamp.csv'
df = pd.read_csv(filename)

View raw data


In [21]:
df


Out[21]:
Location Burrito Hunger Length Circum Volume Tortilla Temp Meat Fillings Meat:filling Uniformity Salsa Synergy Wrap overall Rec Reviewer
0 Don Carlos Taco Shop Shredded chicken 3.0 23.5 21.50 0.86 3.0 5.0 3.00 3.5 4.0 4.0 4.0 4.0 4.0 3.80 Yes Scott
1 Don Carlos Taco Shop Carne asada 3.5 22.5 22.00 0.87 2.0 3.5 2.50 2.5 2.0 4.0 3.5 2.5 5.0 3.00 Yes Scott
2 Don Carlos Taco Shop Soyrizo 1.5 22.5 22.00 0.87 3.0 2.0 2.50 3.0 4.5 4.0 3.0 3.0 5.0 3.00 Yes Emily
3 Don Carlos Taco Shop Soyrizo 2.0 23.0 22.50 0.93 3.0 2.0 3.50 3.0 4.0 5.0 4.0 4.0 5.0 3.75 Yes Ricardo
4 Don Carlos Taco Shop Soyrizo 4.0 NaN NaN NaN 4.0 5.0 4.00 3.5 4.5 5.0 2.5 4.5 4.0 4.20 Yes Scott
5 Don Carlos Taco Shop Soyrizo 4.0 21.5 20.00 0.68 3.0 4.0 5.00 3.5 2.5 2.5 2.5 4.0 1.0 3.20 Yes Emily
6 Don Carlos Taco Shop Soyrizo 1.5 23.0 23.00 0.97 2.0 3.0 3.00 2.0 2.5 2.5 NaN 2.0 3.0 2.60 Yes Scott
7 Don Carlos Taco Shop California 4.0 21.5 20.50 0.72 2.5 3.0 3.00 2.5 3.0 3.5 NaN 2.5 3.0 3.00 Yes Emily
8 Don Carlos Taco Shop California 3.5 23.0 21.50 0.85 2.0 4.5 4.50 3.5 1.5 3.0 3.5 4.0 2.0 3.90 Yes Scott
9 Don Carlos Taco Shop California 3.5 22.0 20.80 0.76 2.5 1.5 1.50 3.0 4.5 3.0 1.5 2.0 4.5 2.00 Yes Scott
10 Don Carlos Taco Shop California 2.0 22.5 21.50 0.83 2.5 2.5 2.75 2.5 2.5 2.0 0.5 3.0 3.5 2.50 Yes Emily
11 Don Carlos Taco Shop California 2.0 23.0 21.50 0.85 3.0 4.0 4.00 3.0 4.0 4.0 1.0 2.0 1.0 3.00 Yes Marc
12 Don Carlos Taco Shop California 3.5 21.0 22.50 0.85 3.0 3.5 3.50 4.0 2.0 3.5 1.0 4.0 4.0 3.90 Yes Scott
13 Don Carlos Taco Shop California 3.0 22.5 21.00 0.79 3.0 1.0 1.50 2.5 4.0 4.0 3.0 4.5 5.0 2.00 Yes Nicole
14 Don Carlos Taco Shop California 3.0 NaN NaN NaN 4.0 NaN 2.00 2.0 4.0 4.0 NaN 3.0 4.0 2.75 Yes Cris
15 Don Carlos Taco Shop California 4.0 22.0 20.00 0.70 3.0 2.5 4.00 4.0 3.5 2.5 3.5 5.0 4.5 4.20 Yes Emily
16 Don Carlos Taco Shop California 2.5 20.5 21.75 0.77 4.0 4.0 4.50 4.0 5.0 4.5 3.5 4.0 2.0 4.10 Yes Scott
17 Don Carlos Taco Shop California 3.0 20.5 20.00 0.65 4.0 4.0 3.00 3.5 4.0 4.5 4.0 4.0 4.5 4.00 No Scott
18 Don Carlos Taco Shop California 2.0 18.5 20.50 0.62 3.5 4.0 3.50 NaN 4.0 NaN 4.0 4.0 1.5 4.00 No Emily
19 Don Carlos Taco Shop California 4.0 21.5 20.00 0.68 3.0 4.0 2.75 3.0 4.0 2.0 2.0 NaN 5.0 3.00 No Leo
20 Don Carlos Taco Shop California 2.5 20.0 20.00 0.64 3.5 3.0 3.00 3.0 4.0 4.0 1.5 NaN 4.5 3.50 No Scott
21 Don Carlos Taco Shop Carnitas 3.5 19.0 21.00 0.67 1.5 2.0 3.00 3.5 4.0 1.0 3.5 4.5 4.0 4.00 No Scott
22 Don Carlos Taco Shop Carnitas 2.5 21.5 22.50 0.87 1.5 2.5 3.50 3.0 4.0 1.5 2.5 3.5 4.5 3.50 No Emily
23 Don Carlos Taco Shop Carnitas 2.5 18.5 21.00 0.65 4.0 3.0 4.00 4.0 4.0 4.0 4.0 4.0 3.0 4.60 No Scott
24 Don Carlos Taco Shop Carnitas 2.5 23.0 19.00 0.66 3.0 2.0 4.50 4.0 4.0 3.5 3.0 4.5 4.5 4.50 No Emily
25 Don Carlos Taco Shop Carnitas 3.5 22.5 20.50 0.75 2.5 2.5 3.00 4.0 4.0 4.0 3.0 3.5 1.5 3.80 No Emily
26 Don Carlos Taco Shop Carnitas 3.5 18.5 21.50 0.68 2.5 3.0 3.00 4.0 2.0 2.0 3.0 3.5 3.5 3.00 No Scott
27 Don Carlos Taco Shop Carnitas 3.5 16.5 23.50 0.73 3.5 5.0 4.00 4.0 3.5 4.5 3.5 3.5 4.0 4.00 No Scott
28 Don Carlos Taco Shop Carnitas 4.5 20.5 20.50 0.69 3.0 5.0 4.00 4.0 5.0 4.0 2.5 4.5 4.0 4.00 No Sage

Brief metadata


In [24]:
print 'Number of burritos:', df.shape[0]
print 'Average burrito rating'
print 'Reviewers: '
print np.array(df['Reviewer'])


Number of burritos: 29
Reviewers: 
['Scott' 'Scott' 'Emily' 'Ricardo' 'Scott' 'Emily' 'Scott' 'Emily' 'Scott'
 'Scott' 'Emily' 'Marc' 'Scott' 'Nicole' 'Cris' 'Emily' 'Scott' 'Scott'
 'Emily' 'Leo' 'Scott' 'Scott' 'Emily' 'Scott' 'Emily' 'Emily' 'Scott'
 'Scott' 'Sage']

What types of burritos have been rated?


In [10]:
def burritotypes(x, types = {'California':'cali', 'Carnitas':'carnita', 'Carne asada':'carne asada',
                             'Soyrizo':'soyrizo', 'Shredded chicken':'chicken'}):
    import re
    T = len(types)
    Nmatches = {}
    for b in x:
        matched = False
        for t in types.keys():
            re4str = re.compile('.*'+types[t]+'.*', re.IGNORECASE)
            if np.logical_and(re4str.match(b) is not None, matched is False):
                try:
                    Nmatches[t] +=1
                except KeyError:
                    Nmatches[t] = 1
                matched = True
        if matched is False:
            try:
                Nmatches['other'] +=1
            except KeyError:
                Nmatches['other'] = 1
    return Nmatches

typecounts = burritotypes(df.Burrito)

In [12]:
plt.figure(figsize=(6,6))
ax = plt.axes([0.1, 0.1, 0.65, 0.65])

# The slices will be ordered and plotted counter-clockwise.
labels = typecounts.keys()
fracs = typecounts.values()
explode=[.1]*len(typecounts)

patches, texts, autotexts = plt.pie(fracs, explode=explode, labels=labels,
                autopct=lambda(p): '{:.0f}'.format(p * np.sum(fracs) / 100), shadow=False, startangle=0)
                # The default startangle is 0, which would start
                # the Frogs slice on the x-axis.  With startangle=90,
                # everything is rotated counter-clockwise by 90 degrees,
                # so the plotting starts on the positive y-axis.

plt.title('Types of burritos',size=30)
for t in texts:
    t.set_size(20)
for t in autotexts:
    t.set_size(20)
autotexts[0].set_color('w')


2. Hypothesis tests


In [ ]:
#California burritos vs. Carnitas burritos
TODO

In [ ]:
# Don Carlos 1 vs. Don Carlos 2
TODO

In [ ]:
# Bonferroni correction
TODO

3. Burrito dimension distributions

Distribution of each burrito quality


In [18]:
import math
def metrichist(metricname):
    if metricname == 'Volume':
        bins = np.arange(.375,1.225,.05)
        xticks = np.arange(.4,1.2,.1)
        xlim = (.4,1.2)
    else:
        bins = np.arange(-.25,5.5,.5)
        xticks = np.arange(0,5.5,.5)
        xlim = (-.25,5.25)
        
    plt.figure(figsize=(5,5))
    n, _, _ = plt.hist(df[metricname].dropna(),bins,color='k')
    plt.xlabel(metricname + ' rating',size=20)
    plt.xticks(xticks,size=15)
    plt.xlim(xlim)
    plt.ylabel('Count',size=20)
    plt.yticks((0,int(math.ceil(np.max(n) / 5.)) * 5),size=15)
    plt.tight_layout()

In [19]:
m_Hist = ['Hunger','Volume','Tortilla','Temp','Meat','Fillings',
          'Meat:filling','Uniformity','Salsa','Synergy','Wrap','overall']
for m in m_Hist:
    metrichist(m)


Test for normal distribution


In [ ]:
TODO