Correlation between Starbucks and Chain Density for Census Tracts

I wanted to see if, just based on yelp data, I could find any correlation between starbucks counts in a tract and percentage of businesses that are chains.

Methodology

Look at value counts for business names in the Yelp data set to develop criteria for a business being a chain/big-box based on numer of occourances of the business name.

I settled on minimum name count of 5 to be considered a chain, even though there were clearly some chains that had fewer than 5 instances in the yelp data set.

I plotted chain fraction vs starbucks count. (I excluded Starbucks from businesses when computing the Yelp chain fraction.)

Results

As can be seen from the figure at the end of this notebook, there is no apparent correlation between Starbucks count in a Census tract and the fraction of chain references.

Note that I tried a number of chain criteria and different cutoffs did not effect the lack of correlation.


In [2]:
%pylab inline
import pandas as pd
import json
bus = pd.DataFrame(json.loads(l) for l in open('yelp/business.json'))
bus_counts = pd.DataFrame(bus.name.value_counts())
bus_counts.columns = ['counts']
bus_counts[bus_counts.counts < 5]


Populating the interactive namespace from numpy and matplotlib
Out[2]:
counts
Mojo Yogurt 4
SKECHERS Factory Outlet 4
Fairfield Inn by Marriott 4
Wolfman Pizza 4
Arizona Federal Credit Union 4
Rock Bottom Restaurant & Brewery 4
CenturyLink Store 4
Thrifty Car Rental 4
Tacos Los Toritos 4
Community Tire Pros & Auto Repair 4
Lumber Liquidators 4
Sanrio 4
Ghost Armor 4
The Roomstore 4
Tortas Paquime 4
Culinary Dropout 4
Bahama Buck's 4
Color Me Mine 4
Mimis Cafe 4
Kangaroo Express 4
Cactus Flower Florists 4
Garcia's Mexican Restaurant 4
Chevys Fresh Mex 4
Roy's Restaurant 4
Tory Burch 4
Papa Murphy's Take 'n' Bake Pizza 4
Avis 4
Quick Trip 4
Budget Car Rental 4
Mecklenburg ABC Liquor Store 4
... ...
Mardi Gras Costume Shop 1
TK Service Center 1
Fonda Mexicana El Paraiso 1
Airbridge Tours 1
Chuparosa Park 1
Hamra Jewelers 1
Charleston Swapmeet 1
Automall Autobody 1
Spa Uptown 1
San Portella Apartments 1
MGM Grand Race & Sports Book 1
Petite Chateau 1
European Auto Salon 1
Massage Envy Spa Dobson 1
J's Barber Shop 1
Maximum Pilates 1
Cafe Assisi 1
Hawaiian Shave Ice 1
Lane Bryant The Shoppes At Gilbert Commons 1
Kim Alterations 1
Marquee Theatre 1
Children's Museum Of Phoenix 1
Little Dumpling Thai & Chinese Cuisine 1
Friomio 1
Hans Gulyas 1
Le Meridien Versailles 1
Superior School of Real Estate 1
Bike Den 1
Eyecare Center 1
Satara 1

44960 rows × 1 columns


In [3]:
bus_counts['chain'] = bus_counts['counts'].apply(lambda c: 1 if c >= 5 else 0)
bus_counts


Out[3]:
counts chain
Starbucks 413 1
McDonald's 293 1
Subway 274 1
Walgreens 161 1
Taco Bell 154 1
Wendy's 123 1
Pizza Hut 119 1
Burger King 113 1
Panda Express 112 1
The UPS Store 107 1
Dunkin' Donuts 101 1
Chipotle Mexican Grill 86 1
Domino's Pizza 85 1
Great Clips 84 1
Wells Fargo Bank 83 1
Bank of America 74 1
Jack in the Box 71 1
Jimmy John's 71 1
Enterprise Rent-A-Car 70 1
Dairy Queen 68 1
Walmart Supercenter 68 1
Papa John's Pizza 67 1
QuikTrip 66 1
Jiffy Lube 66 1
Cvs Pharmacy 62 1
The Home Depot 61 1
Sonic Drive-In 59 1
Supercuts 57 1
Del Taco 55 1
Albertsons 55 1
... ... ...
Mardi Gras Costume Shop 1 0
TK Service Center 1 0
Fonda Mexicana El Paraiso 1 0
Airbridge Tours 1 0
Chuparosa Park 1 0
Hamra Jewelers 1 0
Charleston Swapmeet 1 0
Automall Autobody 1 0
Spa Uptown 1 0
San Portella Apartments 1 0
MGM Grand Race & Sports Book 1 0
Petite Chateau 1 0
European Auto Salon 1 0
Massage Envy Spa Dobson 1 0
J's Barber Shop 1 0
Maximum Pilates 1 0
Cafe Assisi 1 0
Hawaiian Shave Ice 1 0
Lane Bryant The Shoppes At Gilbert Commons 1 0
Kim Alterations 1 0
Marquee Theatre 1 0
Children's Museum Of Phoenix 1 0
Little Dumpling Thai & Chinese Cuisine 1 0
Friomio 1 0
Hans Gulyas 1 0
Le Meridien Versailles 1 0
Superior School of Real Estate 1 0
Bike Den 1 0
Eyecare Center 1 0
Satara 1 0

45694 rows × 2 columns


In [4]:
buswc = bus.merge(bus_counts, left_on='name', right_index=True)
import json
join = pd.DataFrame(json.loads(l) for l in open('business_track.json'))
busg = buswc.merge(join)
yelp_starbucks = busg[busg['name'] == 'Starbucks'].GISJOIN.value_counts()
yelp_starbucks = pd.DataFrame(yelp_starbucks, columns=['yelp_starbucks'])
starbucks = pd.DataFrame(json.loads(l) for l in open('sb_track.json')).GISJOIN.value_counts()
starbucks = pd.DataFrame(starbucks, columns=['starbucks'])
yelp_starbucks = yelp_starbucks.merge(starbucks, left_index=True, right_index=True)
from matplotlib import pyplot
pyplot.scatter(yelp_starbucks.starbucks, yelp_starbucks.yelp_starbucks)
pyplot.plot([0,30], [0,30])
ax = pyplot.gca()
ax.set_xlabel("starbucks location count in tract")
ax.set_ylabel("yelp starbucks location count in tract")
ax.set_title("Compare starbucks location counts for starbucks and yelp data sets")


Out[4]:
<matplotlib.text.Text at 0x12a680650>

Obviously, not all starbucks are in yelp.


In [7]:
chain = busg[busg['name'] != 'Starbucks'][['chain']].groupby(busg.GISJOIN).mean()
chain = chain.merge(starbucks, left_index=True, right_index=True)
pyplot.scatter(chain.starbucks, chain['chain'])
ax = pyplot.gca()
ax.set_xlabel("starbucks location count in tract")
ax.set_ylabel("fraction chain business in tract")


Out[7]:
<matplotlib.text.Text at 0x129677050>

In [ ]: