In 2010, the United States spent 40 billion dollars on coffee and imported over 2.7 billion pounds of it.National Coffee Association. Since then, Americans have imported well over 3 billion pounds of coffee. As coffee-obsessed college students designing this project, we were interested to find out more about this black gold; where it's from, how our consumption compares to other countries as well as some interesting correlations that have to do with the growing trends of US coffee sales.
In this project we are looking at coffee trend in US: consumption growths of coffee in US versus other countries, coffee imports to US, revenue growth of Starbucks, sales of coffee in US, and the correlation of a few interesting factors that might be the drivers of coffee sales in the US. Coffee consumption has been growing in the past few years. Because we are not researching coffee in depth, we won't be able to determine the exact causes of coffee sales, but we would like to point your attention to some things that may affect it (eg. Keurig sales).
After this report, we hope you not only find our graphs useful, but also become a better educated student in coffee!
According to the International Coffee Organization, coffee consumption worldwide increased 1.4% in 2014. This growth rate might seem small and insignificant, but it must be taken in context of a slowdown in consumption growth in a global market that has been growing vigorously over the past decade. Since 1990, the compound annual growth rate in global coffee consumption has been roughly 2%, largely driven by increasing purchasing power and changing tastes in emerging middle-class economies. Against a backdrop of globalization, economic liberalization and the unfettered movement of international capital over the past 25 years, emerging market economies in Latin America, North Africa, Eastern Europe and non-Japan Asia have witnessed a burgeoning middle class. As more of the worlds population dive into the middle class, they have become exposed to the previously unavailable luxuries of a working class lifestyle. Coffee is one of these creature comforts. Coffee demand from Europe and the U.S. in 2007 was roughly 5.1 million tons, vs. about 4.3 million tons from the rest of the world. By contrast, in 2011, demand from EU and US was roughly 5.3 million tons, compared with approximately 5.0 million tons from the rest of world, comprised of coffee-producing countries, tea-drinking regions and so on.International Coffee Organization
U.S. consumer. U.S. coffee consumption increased 3.25% year over year in 2013 and another 2.1% in 2014.4 According to the National Coffee Association’s 2016 “National Coffee Drinking Trends” study, 57% of U.S. adults drink coffee daily, which makes it the country’s second favorite beverage after soft drinks.
How We Acquired Our Data
All our data if not specified otherwise, is from Statista.com, who pulled its data from respectable and reliable data sources. We searched for the data by the title (eg. “Keurig Green Mountain Sales Growth” or “Starbucks Revenue”) and used the excel file provided by Statista. For the purpose of convenience and time saving, we cut empty rows and deleted Statista’s overview tab in each sheet and uploaded them (as csv files) into our own GitHub, coffee data 2016.
For our data on Coffee Imports to the United States, we accessed the USDA website searching up product imports. This is the weblink: https://apps.fas.usda.gov/gats/default.aspx.Data
-Data source:FAS US TRADE
-Product Type: Imports- Consumption
-Product Groups: FAS
-Partners: All Partners
-Products: Coffee & Coffee Pdts
-And clicked on “Download as CSV file.” We uploaded this file to our github.
For every data, we will look at 2005-2014 time period. We adjusted the csv sheets to only reflect these years.
In [1]:
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import sys
import datetime as dt
from pandas_datareader import wb, data as web
from plotly.offline import iplot, iplot_mpl # plotting functions
import plotly.graph_objs as go # ditto
import plotly # just to print version and init notebook
import cufflinks as cf # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)
# these lines make our graphics show up in the notebook
%matplotlib inline
plotly.offline.init_notebook_mode(connected=True)
%matplotlib inline
Coffee is believed to be ultra popular in the US and less popular in Asian countries (who have a higher preference for tea.) We decided to compare annual coffee consumption in USA, one European country (Sweden) and one Asian country (China). The data is annual per capita coffee consumption inpounds for each of the three countries from 2005-2014. It was surprising to find that that Sweden consumes more coffee per capita bases than USA. China, on the other hand, consumes negligible amount of coffee. (The green line that almost seems like the x axis actually represents Chinese comsumption).
In [2]:
## Coffee consumption per capita in US, Sweden and China
url1='https://raw.githubusercontent.com/coffeedata2016/coffee/master/us-per-capita-consumption-of-coffee-2005-2014.csv'
USconsumption = pd.read_csv(url1, names=['Year' , 'Consumption in pounds'])
USconsumption=USconsumption.set_index('Year')
url2='https://raw.githubusercontent.com/coffeedata2016/coffee/master/per-capita-consumption-of-coffee-in-sweden-2006-2014.csv'
Sweden = pd.read_csv(url2, names=['Year' , 'Consumption in pounds'])
Sweden=Sweden.set_index('Year')
url4='https://raw.githubusercontent.com/coffeedata2016/coffee/master/per-capita-coffee-consumption-in-china-2005-2014.csv'
China = pd.read_csv(url4, names=['Year' , 'Consumption in pounds'])
China=China.set_index('Year')
S = pd.read_csv(url2, names=['Year' , 'Consumption in pounds in Sweden'])
C = pd.read_csv(url4, names=['Year' , 'Consumption in pounds in China'])
U=pd.read_csv(url1, names=['Year' , 'Consumption in pounds in USA'])
Merged=S.merge(C, on='Year').merge(U, on='Year') #merging 3 dataframes that represent consumptions in 3 countries
Merged=Merged.set_index('Year')
ax1=Merged.plot(xticks=Merged.index, legend=True, alpha=5
,
ylim=(0,20), grid=False, title='Coffee consumption in USA, Sweden and China', figsize=(20,15), linewidth=5 )
ax1.set_xticklabels(Merged.index) #setting Years as ticks for x-axis
ax1.set_axis_bgcolor('w')
ax1.grid(color='black', linestyle='-', linewidth=2, alpha=0.1)
ax1.grid('on', which='minor', axis='x' )
ax1.grid('off', which='major', axis='x' )
ax1
Out[2]:
We want to see how coffee retail market is growing in the US. The stereotype that Americans drink a lot of coffee is partially true. As we can see in the graph below the value of sales is constantly increasing. It has more than doubled in 10 years from 4 trillion USD to almost 9 trillion USD.
In [3]:
#Retail sales in US in Billion dollars
url5='https://raw.githubusercontent.com/coffeedata2016/coffee/master/us-coffee-retail-sales-2005-2014.csv'
USsales = pd.read_csv(url5, names=['Year', 'Sales' ])
USsales=USsales.set_index('Year')
USsales.plot(kind='bar', title='Coffee Retail Sales in US in $Bn', legend=True, fontsize=15, color='red', linewidth=7, figsize=(18,5)).title.set_size(30)
In [4]:
# import packages
import numpy as np
import plotly.plotly as py
import pandas as pd
#Define dataframe
df = pd.read_csv('https://raw.githubusercontent.com/coffeedata2016/coffee/master/Coffee%20Imports%20Dataset.csv')
data = [ dict(
type = 'choropleth',
locations = df['ISO3166.1.Alpha.3'],
z = df['2014'],
text = df['Country'],
colorscale = [[0,"rgb(5, 10, 172)"],[0.35,"rgb(40, 60, 190)"],[0.5,"rgb(70, 100, 245)"],\
[0.6,"rgb(90, 120, 245)"],[0.7,"rgb(106, 137, 247)"],[1,"rgb(220, 220, 220)"]],
autocolorscale = False,
reversescale = True,
marker = dict(
line = dict (
color = 'rgb(180,180,180)',
width = 0.5
) ),
colorbar = dict(
autotick = False,
tickprefix = '$',
title = 'Coffee<br>Imports<br>(USD millions)'),
) ]
#Define Map layout
layout = dict(
title = 'Top US Coffee Imports by Country, 2014<br>Source: USDA.gov',
geo = dict(
showframe = False,
showcoastlines = False,
projection = dict(
type = 'Mercator'
)
)
)
fig = dict( data=data, layout=layout )
py.iplot( fig, validate=False, filename='d3-world-map' )
Out[4]:
In [5]:
#Retail sales in US growth
url50='https://raw.githubusercontent.com/coffeedata2016/coffee/master/US%20Coffee%20sales%20growths%202005-2014.csv'
USsalesgrowth = pd.read_csv(url50, names=['Year' , 'Sales growth'])
USsalesgrowth=USsalesgrowth.set_index('Year')
url6 = "https://raw.githubusercontent.com/coffeedata2016/coffee/master/Keurig%20Green%20Mountains%20Sales%20Growth%20Percentages%202005-2014.csv"
keurig_sales_growth = pd.read_csv(url6, names=['Year' , 'Growth'])
keurig_sales_growth=keurig_sales_growth.set_index('Year')
# Keurig Green Mountain Global Net sales in millions USD - this is proxy for office coffee consumption
url8 = 'https://raw.githubusercontent.com/coffeedata2016/coffee/master/keurig-green-mountains-global-net-sales-2005-2014.csv'
KCupSales = pd.read_csv(url8, names=['Year' , 'Global Net Sales'])
KCupSales=KCupSales.set_index('Year')
Data: percentage change in number of US college students and percentage changes in US retail coffee sales. We conducted regression where college students variable is explanatory variable and sales growth is dependednt variable. We see from the table that the result is statistically signicant.
In [6]:
import scipy.stats as stats
import statsmodels.api as sm
import statsmodels.formula.api as smf
data= pd.DataFrame ({
'Keurig' :[31.32, 39.5, 49.18, 46.53, 59.61, 72.59, 95.39, 45.58, 12.93, 8.2]
,
'Sales':[16.9, 7.1, 4.5, 10, 7, 7.3, 2.6, 12.4, 5.3, 4.5]})
results = smf.ols('Sales~ Keurig', data).fit() #running regression
results.summary() #displaying the results
Out[6]:
However, when we plot the regression line and data points we see that the points are dispersed and based on the plot we can conclude that the relationship seems to be weak.
In [7]:
import numpy as np
slope = results.params
intercept, slope = results.params
r2 = results.rsquared
slope, intercept, r2 #getting slope and intercept from previous results
plt.plot(data['Keurig'], data['Sales'], 'bo') #plotting the data points
plt.hold(True)
x = np.array([min(data['Keurig']), max(data['Keurig'])])
y = intercept + slope * x
plt.plot( x, y, 'r-') #plotting the line
plt.show()
We took Keurig US sales growth percentages as the proxy for office coffee consumption and sales. Keurig is the foremost office coffee supplier, therefore we believe it would be appropriate to use it as a proxy. We regressed percentage growths of Keurig Sales on percentage growthsof US retail coffee sales. The result is significant if we look at the table.
In [8]:
data2 = pd.DataFrame ({
'StudentsGrowth' :[1.3, 1.5, 2.8, 4.7, 6.3, 3.5, 0.1, -1.8, -1.3, -0.8]
,
'RevenueGrowth':[16.9, 7.1, 4.5, 10, 7, 7.3, 2.6, 12.4, 5.3, 4.5]})
results3 = smf.ols('RevenueGrowth ~ StudentsGrowth', data2).fit()
results3.summary()
Out[8]:
However, if we look at the regression line and the plot of points, we see that the points are dispersed and it is hard to infer a strong relationship between variables.
In [9]:
slope2 = results3.params
slope2
intercept2, slope2 = results3.params
r20 = results3.rsquared
slope2, intercept2, r20
plt.plot(data2['StudentsGrowth'], data2['RevenueGrowth'], 'bo')
plt.hold(True)
x3 = np.array([min(data2['StudentsGrowth']), max(data2['StudentsGrowth'])])
y3 = intercept2 + slope2 * x3
plt.plot(x3, y3, 'r-')
plt.show()
We want to see how specialty coffee sales contribute to the overall retail sales. Data: US specialty coffee sales percentage growths and US retail coffee sales growths. As we can see from the graph, the relationship is not stable. Specialty coffee sales are growing and this is obviously contributing to the overall retail sales. However, since the growth rates of specialty coffee sales are not stable, there are many other factors affecting overall US coffee sales.
In [10]:
#import plotly package toolset
import plotly
plotly.tools.set_credentials_file(username="kdg777", api_key="y6WgfYZUp59UUizb4484")
In [11]:
url200='https://raw.githubusercontent.com/coffeedata2016/coffee/master/US%20Coffee%20sales%20growth%20rates%202005-2014.csv'
url201='https://raw.githubusercontent.com/coffeedata2016/coffee/master/Specialty%20Coffee%20Growth.csv'
In [12]:
import plotly.plotly as py
import plotly.graph_objs as go
df2 = pd.read_csv(url200)
ds2= pd.read_csv(url201)
trace1 = go.Bar(
x=[2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015],
y= df2["16.9"],
name='US Coffee Sales Growth rates',
marker=dict(
color='rgb(14, 71, 182)'
)
)
trace2 = go.Bar(
x=[2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015],
y=ds2["8.5"],
name='US Specialty Coffee Sales Growth Rates',
marker=dict(
color='rgb(192, 11, 25)'
)
)
data = [trace1, trace2]
layout = go.Layout(
title='US Coffee Sales vs US Specialty Coffee Sales',
xaxis=dict(
tickfont=dict(
size=14,
color='rgb(107, 107, 107)'
)
),
yaxis=dict(
title='Percentage',
titlefont=dict(
size=16,
color='rgb(107, 107, 107)'
),
tickfont=dict(
size=14,
color='rgb(107, 107, 107)'
)
),
legend=dict(
x=0,
y=1.0,
bgcolor='rgba(255, 255, 255, 0)',
bordercolor='rgba(255, 255, 255, 0)'
),
barmode='group',
bargap=0.15,
bargroupgap=0.1
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='style-bar')
Out[12]:
By the way, this graph is interactive! Hover your mouse over the bars.
There is one interesting hypothesis that we came accross in the internet hypothesizing that coffee consumption influences stress levels. We decided to check that assumption. Data: stress level in the US for two demographics: Families with less than 50k income and families with more than 50k income. On the graph below we see that stress levels are not very different accross these two demographic. Families with less than 50k income show a little higher stress levels in some of the years. The difference in some of the years is shown in blue.
In [13]:
# Stress levels in US by two income classes
url20 = 'https://raw.githubusercontent.com/coffeedata2016/coffee/master/Level-of-stress-among-us-adults-by-household-income-2005-2014.csv'
stress = pd.read_csv(url20)
stress=stress.set_index('Year')
In [14]:
fig_mpl, ax = plt.subplots(figsize=(21, 10))
stress.plot.barh(ax=ax, y="Household income less than 50,000 U.S. dollars", color="Blue", grid=False)
stress.plot.barh(ax=ax, y="Household income greater or equal to 50,000 U.S. dollars", color="Green", grid=False);
We regressed the US coffee lb consumption per capita on stress levels of families with higher than 50k income. We chose this demographic since people with lower than 50k income might have many other reasons affecting their stress levels. From the regresison we see that the results are not even statistically significant. Therefore, we reject this hypothesis.
In [15]:
from pandas.stats.api import ols
res5 = ols(y=stress['Household income greater or equal to 50,000 U.S. dollars'], x=USconsumption['Consumption in pounds'])
res5
Out[15]:
What would a project on coffee be if we left out the mother of coffee culture in America? No matter how refined your coffee palette is and how #basic you think Starbucks drinkers are, we all have Starbucks to thank for implanting coffee culture in America. As our lasting tribute to this project, the following is data from Starbucks to remind us of its impact on illustrate its prescence in our society.
In [16]:
#Starbucks Advertising expense in million USD
url11 = 'https://raw.githubusercontent.com/coffeedata2016/coffee/master/starbuckss-advertising-spending-in-the-us-2005-2014.csv'
SB_Spend = pd.read_csv(url11, names=['Year' , 'Ad Spending'])
SB_Spend=SB_Spend.set_index('Year')
#Starbucks Annual Revenue in billion USD
url10 = 'https://raw.githubusercontent.com/coffeedata2016/coffee/master/starbucks-annual-revenue-worldwide-2005-2014.csv'
SB_Revenue = pd.read_csv(url10, names=['Year' , 'Annual Revenue'])
SB_Revenue=SB_Revenue.set_index('Year')
In [ ]:
In [17]:
plotly.tools.set_credentials_file(username='kdg777', api_key='y6WgfYZUp59UUizb4484')
df = pd.read_csv('https://raw.githubusercontent.com/coffeedata2016/coffee/master/Starbuckslocations.csv') #dataframe object
df.head()
df['text'] = df['name'] + '<br>Starbucks Store Count ' + (df['stores']).astype(str)+' stores' #this is for the text box appearing upon hover
colors = ["rgb(0,116,217)","rgb(255,65,55)","rgb(240,211,67)","rgb(119,225,122)","rgb(197,39,202)", "rgb(0,0,0)"] #colors for the legend
legend = ['601+','201-600','51-200','21-50','11-20','<10'] #legend
stores = [] #final formatted data of data sent to plotly (see last line)
linda = [] #new loop created by yours truly
df = df.sort_values(['stores'], ascending=False) #sort data by stores values
df = df.reset_index(drop=True)
#variables needed for sorting loop (not all may be needed to be instantiated outside the loop)
i3=0
i2=-1
previous=0
x=1
thecase = True
numStates = 50
#loop to goes through all States then allocates them to catagories based on store ranges in linda but changes df variable name for each catagory.
while i3<numStates: #logic to create limits based on store number ranges stored in array linda
df_sub = df.iloc[i3]
if( df_sub['stores']>0 and df_sub['stores']<11):
i2=5
print('i2 5')
if(df_sub['stores']>10 and df_sub['stores']<21):
i2=4
if(df_sub['stores']>20 and df_sub['stores']<51):
i2=3
if(df_sub['stores']>50 and df_sub['stores']<201):
i2=2
if(df_sub['stores']>200 and df_sub['stores']<601):
i2=1
if(df_sub['stores']>600):
i2=0
if(not(i2==previous)): #if the next limit is reached, log it by initially changing the bool value
thecase=True
if(thecase or i3==(numStates-1)):
if(x==1): #if x is 1, you are setting the lower limit. ex (x,y) you are setting x
print('i3: '+str(i3))
if(i3==0):
therange=0
else:
therange=i3-1
thecase = False
if(x==2): #setting y and log the limit
linda.append((therange,i3))
x=0
therange=i3
if(i3==numStates-1): #special case for the last value in the list
if(thecase):
linda.append((numStates-1,500)) #500 is an arbitarary value because the next loop does some weird things if the limit is, for example, (49,49)
if(not(thecase)):
linda.append((therange,500))
break #exit the loop
x=x+1
previous=i2
i3=i3+1
for y in range(len(linda)): #Go through the list of limits and
lim = linda[y] #make a variable for the limit range per the loop
df_sub2 = df[lim[0]:lim[1]] #Make a sub list based on the current limit of the loop
state = dict( #create dictionary values for each row (state) in the sub list
type = 'scattergeo',
locationmode = 'USA-states',
lon = df_sub2['long'],
lat = df_sub2['lat'],
text = df_sub2['text'],
marker = dict(
size = df_sub2['stores'],
color = colors[y],
line = dict(width=1, color='rgb(40,40,40)'),
sizemode = 'area'
),
name = legend[y] )
stores.append(state)
layout = dict( #Set the layout for the textbox
title = 'Starbucks Store Count by State <br>(Click legend to toggle traces)',
showlegend = True,
geo = dict(
scope='usa',
projection=dict( type='albers usa' ),
showland = True,
landcolor = 'rgb(217, 217, 217)',
subunitwidth=1.5,
countrywidth=1,
subunitcolor="rgb(255, 255, 255)",
countrycolor="rgb(255, 255, 255)",
showlakes= True, lakecolor= "72D0EC",
showframe= True
),
)
fig = dict( data=stores, layout=layout )
py.iplot( fig, validate=False, filename='IKE-TEST' )
Out[17]:
Starbuck's sphere of influence within the United States is enormous. The following graph compares the past revenue percentage growth with the growth of Starbucks' share of Ad Spending between 2005 and 2013. As it is displayed, Starbucks's original ad spending campaign was fairly reactive to the overall revenue stream and was appropriately allocated as revenue streams would change.
In [18]:
# import appropriate packages
import plotly.plotly as py
import plotly.graph_objs as go
revgrowth_url = "https://raw.githubusercontent.com/coffeedata2016/coffee/master/Starbucks%20Revenue%20growth%202005-2014.csv"
adspend_url = "https://raw.githubusercontent.com/coffeedata2016/coffee/master/starbuckss-advertising-spending-growth-in-the-us-2005-2014.csv"
#create dataframes for different sets of data
df = pd.read_csv(revgrowth_url)
ds = pd.read_csv(adspend_url)
#define datasets (trace1 being Revenue, trace2 being Ad Spending Growth)
trace1 = go.Bar(
x=[2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015],
y= df["18.4"],
name='Starbucks Revenue Growth',
marker=dict(
color='rgb(55, 83, 109)'
)
)
trace2 = go.Bar(
x=[2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015],
y=ds["11"],
name='Starbucks Ad Spending Growth',
marker=dict(
color='rgb(26, 118, 255)'
)
)
data = [trace1, trace2]
#define graph layout
layout = go.Layout(
title='Starbucks Revenue vs Ad Spending Percentage Growth',
xaxis=dict(
tickfont=dict(
size=14,
color='rgb(107, 107, 107)'
)
),
yaxis=dict(
title='% Growth',
titlefont=dict(
size=16,
color='rgb(107, 107, 107)'
),
tickfont=dict(
size=14,
color='rgb(107, 107, 107)'
)
),
legend=dict(
x=0,
y=2.0,
bgcolor='rgba(255, 255, 255, 0)',
bordercolor='rgba(255, 255, 255, 0)'
),
barmode='group',
bargap=0.15,
bargroupgap=0.1
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='style-bar')
Out[18]:
In the two previous graphs, we have seen the prescence of Starbucks in the US (from number of stores in the US from the above map and the increasing revenue of US Starbucks from above bar charts.) Now, if we look at its share price in the span of 10 years we can see that the stock price has almost doubled and is continuing to rise on an upward trend.
In [19]:
from plotly.tools import FigureFactory as FF
from plotly.graph_objs import Line, Marker
from datetime import datetime
df = web.DataReader("sbux", 'yahoo', datetime(2004, 1, 1), datetime(2014, 1, 1))
fig = FF.create_candlestick(df.Open, df.High, df.Low, df.Close, dates=df.index)
# Make increasing ohlc sticks and customize their color and name
fig_increasing = FF.create_candlestick(df.Open, df.High, df.Low, df.Close, dates=df.index,
direction='increasing', name='Starbucks',
marker=Marker(color='rgb(150, 100, 250)'),
line=Line(color='rgb(150, 200, 250)'))
# Make decreasing ohlc sticks and customize their color and name
fig_decreasing = FF.create_candlestick(df.Open, df.High, df.Low, df.Close, dates=df.index,
direction='decreasing',
marker=Marker(color='rgb(128, 128, 128)'),
line=Line(color='rgb(128, 128, 128)'))
# Initialize the figure
fig = fig_increasing
# Add decreasing data with .extend()
fig['data'].extend(fig_decreasing['data'])
iplot(fig)
Through this project, we've taken you around the world, and then around the states in terms of coffee consumption. You've seen that neither Keurig sales nor the growth rate of college students (who we thought would largely impact the coffee market) are signficiant drivers of total coffee sales in America. You saw some graphs on the relationship between coffee and stress levels, as well as if stress affected coffee consumption when measured against people with household income more than 50k.
Finally we gave you a glimpse of the current state of Starbucks in America and its prescence continues to increase globally. From this data, we see that coffee consumption in America is only rising in coming years. We hope that this project has awakened your insight on coffee in America, and we hope that you appreciate all the economic data involved behind your next cup of joe.
In [ ]: