San Diego Burrito Analytics: Data characterization

Scott Cole

2 July 2016

This notebook analyzes the dimensions of burritos

  1. What dimension are people most critical (small mean)? Least critical? Most sensitive? Least sensitive (small variance)?

Default imports


In [1]:
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import pandas as pd
import pandasql

import seaborn as sns
sns.set_style("white")

Load data


In [2]:
import util
df = util.load_burritos()
N = df.shape[0]

Calculate mean and variance of dimension ratings


In [3]:
means = df.mean()
variances = df.var()

In [4]:
print means
print variances


Yelp             3.971831
Google           4.223944
Hunger           3.254930
Cost             6.903592
Length          20.036250
Circum          22.344355
Volume           0.799839
Tortilla         3.358099
Temp             3.573723
Meat             3.479433
Fillings         3.494681
Meat:filling     3.433333
Uniformity       3.283571
Salsa            3.166418
Synergy          3.497143
Wrap             3.864286
overall          3.481221
Unreliable       0.000000
dtype: float64
Yelp            0.243882
Google          0.111196
Hunger          0.729550
Cost            1.211169
Length          5.729760
Circum          1.864168
Volume          0.018044
Tortilla        0.608391
Temp            1.115334
Meat            0.698753
Fillings        0.647989
Meat:filling    1.128381
Uniformity      1.389584
Salsa           0.846270
Synergy         0.816934
Wrap            1.467061
overall         0.583140
Unreliable      0.000000
dtype: float64

people are most critical of salsa, least critical of wrap integrity people are most sensitive to wrap integrity and least sensitive to Fillings (overall???)

Play with ggplot


In [30]:
from ggplot import *
print ggplot(df,aes('Meat','Fillings',color='overall')) +\
    geom_point(size=120,alpha=.2) +\
    xlab('Meat rating') + ylab('Fillings rating') +\
    scale_color_gradient(low = 'red', high = 'blue')


<ggplot: (33004040)>

In [32]:
import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)
print s


string With Punctuation

In [34]:
x = ['x','y','z']
a = 'x'
if a in x:
    print 'yes'
else:
    print 'no'


yes

In [35]:
w = {'e':2,'f':4}
w.keys()


Out[35]:
['e', 'f']

In [ ]: