San Diego Burrito Analytics: Data characterization

Scott Cole

2 July 2016

This notebook analyzes the dimensions of burritos

What dimension are people most critical (small mean)? Least critical? Most sensitive? Least sensitive (small variance)?

Default imports



In [1]:

    
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import pandas as pd
import pandasql

import seaborn as sns
sns.set_style("white")

Load data



In [2]:

    
import util
df = util.load_burritos()
N = df.shape[0]

Calculate mean and variance of dimension ratings



In [3]:

    
means = df.mean()
variances = df.var()



In [4]:

    
print means
print variances









    



Yelp             3.971831
Google           4.223944
Hunger           3.254930
Cost             6.903592
Length          20.036250
Circum          22.344355
Volume           0.799839
Tortilla         3.358099
Temp             3.573723
Meat             3.479433
Fillings         3.494681
Meat:filling     3.433333
Uniformity       3.283571
Salsa            3.166418
Synergy          3.497143
Wrap             3.864286
overall          3.481221
Unreliable       0.000000
dtype: float64
Yelp            0.243882
Google          0.111196
Hunger          0.729550
Cost            1.211169
Length          5.729760
Circum          1.864168
Volume          0.018044
Tortilla        0.608391
Temp            1.115334
Meat            0.698753
Fillings        0.647989
Meat:filling    1.128381
Uniformity      1.389584
Salsa           0.846270
Synergy         0.816934
Wrap            1.467061
overall         0.583140
Unreliable      0.000000
dtype: float64

people are most critical of salsa, least critical of wrap integrity people are most sensitive to wrap integrity and least sensitive to Fillings (overall???)

Play with ggplot



In [30]:

    
from ggplot import *
print ggplot(df,aes('Meat','Fillings',color='overall')) +\
    geom_point(size=120,alpha=.2) +\
    xlab('Meat rating') + ylab('Fillings rating') +\
    scale_color_gradient(low = 'red', high = 'blue')









    












    



<ggplot: (33004040)>



In [32]:

    
import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)
print s









    



string With Punctuation



In [34]:

    
x = ['x','y','z']
a = 'x'
if a in x:
    print 'yes'
else:
    print 'no'

yes



In [35]:

    
w = {'e':2,'f':4}
w.keys()









    Out[35]:





['e', 'f']



In [ ]: