Most of what we do in this course is find some data that interests us, read it into Python, organize it in a convenient form, and produce a graph. The details vary, but that's the general plan. The best-case scenario: The graph tells us something we don't know, and points to other things we'd like to follow up on. As for the Python code: You'll be able to do all this and more by the end of the term.
Most of this runs with packages that come pre-installed with the Anaconda distribution. The exceptions are the options example, which uses the pandas-datareader package, and the heatmap we produced for economic indicators using the seaborn package. We'll talk more about this later, but you can install them from the terminal or command line with conda install package
.
This IPython notebook was created by Dave Backus, Chase Coleman, and Spencer Lyon for the NYU Stern course Data Bootcamp.
In [7]:
# import packages
import pandas as pd # data management
import matplotlib.pyplot as plt # graphics
import matplotlib as mpl # graphics parameters
import numpy as np # numerical calculations
# IPython command, puts plots in notebook
%matplotlib inline
# check Python version
import datetime as dt
import sys
print('Today is', dt.date.today())
print('What version of Python are we running? \n', sys.version, sep='')
We see a lot of variation in GDP growth rates in all countries. It's not obvious why, but it's a fact of life, and one that investors and other business analysts track closely. Basic numbers like GDP only come out quarterly, and with at least a one month lag, so we're often in the position of not knowing how the economy is doing now, much less how it will do in the future. To get a clearer sense of current conditions, analysts typically look at a wide range of indicators. Many of these indicators are available more quickly than GDP. Since they typically move up and down with GDP, we can use their movements to infer what GDP might be doing.
Of the hundreds of economic indicators, some of the most popular (with their FRED codes) are
All of them are monthly. You can find more about this kind of thing in our book, chapter 11.
In [8]:
# get data from FRED
import pandas as pd
import pandas.io.data as web # web interface with FRED
import datetime as dt # handles dates
# get data
indicators = ['INDPRO', 'PAYEMS', 'AWHMAN', 'PERMIT', 'NAPM', 'UMCSENT']
start_date = dt.datetime(1970, 1, 1)
inds = web.DataReader(indicators, "fred", start_date)
end = inds.index[-1]
# yoy growth rates
g = inds.pct_change(periods=12).dropna()
# standardize
g_std = (g - g.mean()) / g.std()
gs = g_std
# correlations
g_std.corr()
Out[8]:
In [9]:
# plot
fig, ax = plt.subplots()
g_std.plot(ax=ax)
ax.set_title('Economic indicators', fontsize=14, loc='left')
ax.set_ylabel('Standard deviations from mean')
ax.set_xlabel('')
ax.hlines(y=0, xmin=start_date, xmax=end, linestyles='dashed')
ax.legend().set_visible(False)
In [10]:
# focus on recent past
recent_date = dt.datetime(2011, 1, 1)
g_std = g_std[g_std.index>=recent_date]
In [11]:
fig, ax = plt.subplots()
g_std.plot(ax=ax)
ax.set_title('Zoom in on recent past', fontsize=14, loc='left')
ax.set_ylabel('Standard deviations from mean')
ax.set_xlabel('')
ax.hlines(y=0, xmin=recent_date, xmax=end, linestyles='dashed')
ax.legend(loc='upper left', fontsize=10, handlelength=2, labelspacing=0.15)
Out[11]:
Comment. Chase likes heatmaps. The one that follows needs work, but it's an attempt to reproduce the previous graph as a heatmap, where the value of the variable is represented by the color and its intensity. Dark red: large postive value. Dark blue: large negative value. Light colors: small values.
In [12]:
import seaborn as sns # graphics packages
fig, ax = plt.subplots()
sns.heatmap(g_std, linewidths=.5, ax=ax)
ax.set_yticklabels(g_std.index, visible=False)
Out[12]:
Comment. This needs some work on the dates.
The following two lines reset the graphics to the pyplot standard. If we ignore them, we get the seaborn standard, which most people think looks better. If you don't have seaborn installed, none of this matters.
In [7]:
# reset graphics (or not)
#mpl.rcParams.update(mpl.rcParamsDefault)
#%matplotlib inline
In [ ]:
Not long ago, the US had more people working (a higher fraction of the adult population) than many other developed countries. Over the past 15 years things have flipped. The deep question is why, but here we simply report what we know.
The key variables are the employment rate (fraction of people aged 25-54) and the participation rate (fraction either working or unemployed). They're similar but the latter irons out some of the cyclical fluctuations. We get the data from FRED, but they get it from the OECD's Main Economic Indicators, which covers mostly rich countries.
In [5]:
import pandas as pd # data management
import datetime as dt # time and date tool
import pandas.io.data as web # data interface (soon to be moved, see warning)
In [6]:
# countries = AU, CA, CH, DE, ES, EU, EZ, FR, GB, JP, KR, NL, O1 (OECD), SE, US, ZA
countries = ['CA', 'DE', 'GB', 'JP', 'US']
emcodes = ['LREM25TT' + c + 'Q156S' for c in countries]
lrcodes = ['LRAC25TT' + c + 'Q156S' for c in countries]
start_date = dt.datetime(2000, 1, 1)
em = web.DataReader(emcodes, 'fred', start_date)
lr = web.DataReader(lrcodes, 'fred', start_date)
em.columns = countries
lr.columns = countries
em.head(3)
Out[6]:
In [10]:
def emplot(em, title):
fig, ax = plt.subplots()
em.plot(ax=ax, lw=2)
ax.set_title(title, fontsize=14, loc='left')
ax.set_xlabel('')
ax.set_ylabel('Percent')
ax.legend(loc='best', fontsize=8, handlelength=2, labelspacing=0.1)
emplot(em, 'Employment rates, age 25-54')
emplot(lr, 'Participation rates, age 25-54')
In [ ]:
In [ ]:
One of the traditional ways for a country to get into trouble is to issue so much debt that investors worry about getting paid back. How much is that? Hard to say, but we'll look at some debt numbers from the IMF's World Economic Outlook or WEO, a popular source of international data on debt, deficits, and other macroeconomic indicators. We use numbers for the ratio of government debt to GDP, a standard indicator of public indebtedness.
Given recent events, we pay special attention to Argentina and Greece.
In [11]:
%%time
import pandas as pd
import matplotlib.pyplot as plt
url1 = 'http://www.imf.org/external/pubs/ft/weo/2015/01/weodata/'
url2 = 'WEOApr2015all.xls'
url = url1 + url2
weo = pd.read_csv(url, sep='\t', thousands=',', na_values=['n/a', '--'])
weo.shape
In [12]:
list(weo[list(range(12))])
Out[12]:
In [13]:
country_guide = weo[['ISO', 'Country']].drop_duplicates().set_index('ISO')
variable_guide = weo[['WEO Subject Code', 'Subject Descriptor', 'Subject Notes']].drop_duplicates().set_index('WEO Subject Code')
variable_guide.head(3)
Out[13]:
In [14]:
variables = ['GGXWDG_NGDP']
countries = ['ARG', 'DEU', 'FRA', 'GRC', 'USA']
debt = weo[weo['ISO'].isin(countries) & weo['WEO Subject Code'].isin(variables)]
some = [3] + list(range(9,44))
debt = debt[some].set_index('Country').T.dropna()
debt.head(3)
Out[14]:
In [15]:
fig, ax = plt.subplots()
debt.plot(ax=ax, lw=2)
ax.set_title('Ratio of government debt to GDP', fontsize=14, loc='left')
ax.set_ylabel('Government Debt (Percent of GDP)')
ax.legend(loc='best', fontsize=10, handlelength=2, labelspacing=0.15)
Out[15]:
In [ ]:
You can run commands on the command line from within jupyter by putting a !
at the beginning of the line. To run the commands above you can do enter the following in a code cell:
!conda install -y pandas-datareader html5lib
We'll use the Options
tool, which reads in prices of stock options from Yahoo Finance. The options come with a variety of contract specifications:
We give an illustration using options on Amazon stock.
The first version of this module was written by our very own Spencer Lyon.
In [2]:
import pandas_datareader.yahoo.options as yho
"""
supply ticker, get option prices
"""
ticker = 'amzn'
otk = yho.Options(ticker)
exp = otk.expiry_dates # get expiration dates
exp
Out[2]:
In [3]:
# get option prices
cols = [0, 1, 2, 7]
opexp = 2
print('Options with expiry', exp[opexp])
calls = otk.get_call_data(expiry=exp[opexp])[cols]
puts = otk.get_put_data(expiry=exp[opexp])[cols]
calls.head()
Out[3]:
In [4]:
# drop extra index levels
calls = calls.reset_index(level=[1,2,3], drop=True)
puts = puts.reset_index(level=[1,2,3], drop=True)
# cut off extremes
spot = otk.underlying_price
print('Spot price', spot)
delta = 0.25
calls = calls[(calls.index >= (1-delta)*spot) & (calls.index <= (1+delta)*spot)]
puts = puts[(puts.index >= (1-delta)*spot) & (puts.index <= (1+delta)*spot)]
# compute avg of bid and ask
calls['Mid'] = (calls['Bid'] + calls['Ask'])/2
puts['Mid'] = (puts['Bid'] + puts['Ask'])/2
In [1]:
# plot put and call prices
which = 'Mid'
fig, ax = plt.subplots()
calls[which].plot(lw=2, color='blue', alpha=0.6, ax=ax)
puts[which].plot(lw=2, color='magenta', alpha=0.6, ax=ax)
ymin, ymax = ax.get_ylim()
xmin, xmax = ax.get_xlim()
ax.set_title('Prices of Amazon options (bid-ask avg)', fontsize=14, loc='left')
ax.set_ylabel('Option Prices')
ax.set_xlabel('Strike Price')
ax.vlines(x=spot, ymin=ymin, ymax=ymax, linestyle='dashed')
ax.text(1.01*spot, 0.9*ymax, 'Stock price', horizontalalignment='left')
ax.text(0.875*spot, 0.13*ymax, 'Put prices', horizontalalignment='right', color='m')
ax.text(1.125*spot, 0.13*ymax, 'Call prices', horizontalalignment='left', color='b')
In [ ]:
Data from the UN's Population Division. One of our favorite quotes:
Last year, for the first time, sales of adult diapers in Japan exceeded those for babies.
This is what the numbers look like. They're UN projections, what they call the "medium variant."
In [28]:
url1 = 'http://esa.un.org/unpd/wpp/DVD/Files/'
url2 = '1_Indicators%20(Standard)/EXCEL_FILES/1_Population/'
url3 = 'WPP2015_POP_F07_1_POPULATION_BY_AGE_BOTH_SEXES.XLS'
url = url1 + url2 + url3
cols = [2, 4, 5] + list(range(6,28))
#est = pd.read_excel(url, sheetname=0, skiprows=16, parse_cols=cols, na_values=['…'])
prj = pd.read_excel(url, sheetname=1, skiprows=16, parse_cols=cols, na_values=['…'])
"""
for later: change cols for the two sources, rename 80+ to 80-84, then concat
#pop = pd.concat([est, prj], axis=0, join='outer')
"""
pop = prj
In [30]:
pop.head()
Out[30]:
In [31]:
# rename some variables
pop = pop.rename(columns={'Reference date (as of 1 July)': 'Year',
'Major area, region, country or area *': 'Country',
'Country code': 'Code'})
# select Japan and years
countries = ['Japan']
years = [2015, 2025, 2035, 2045, 2055, 2065]
pop = pop[pop['Country'].isin(countries) & pop['Year'].isin(years)]
pop = pop.drop(['Country', 'Code'], axis=1)
pop.head()
Out[31]:
In [32]:
pop = pop.set_index('Year').T
pop.head()
Out[32]:
In [33]:
ax = pop.plot(kind='bar',
alpha=0.5, subplots=True, sharey=True, figsize=(6, 8))
for axnum in range(len(ax)):
ax[axnum].set_title('')
ax[0].set_title('Japanese population by age', fontsize=14, loc='left')
Out[33]:
In [ ]:
We might wonder, why is the population falling in Japan? Other countries? Well, one reason is that birth rates are falling. Demographers call this fertility. Here we look at the fertility using the same UN source as the previous example. We look at two variables: total fertility and fertility by age of mother. In both cases we explore the numbers to date, but the same files contain projections of future fertility.
In [34]:
%%time
import pandas as pd
# fertility overall
uft = 'http://esa.un.org/unpd/wpp/DVD/Files/'
uft += '1_Indicators%20(Standard)/EXCEL_FILES/'
uft += '2_Fertility/WPP2015_FERT_F04_TOTAL_FERTILITY.XLS'
cols = [2, 4] + list(range(5,18))
ftot = pd.read_excel(uft, sheetname=0, skiprows=16, parse_cols=cols, na_values=['…'])
# fertility by age
ufa = 'http://esa.un.org/unpd/wpp/DVD/Files/'
ufa += '1_Indicators%20(Standard)/EXCEL_FILES/'
ufa += '2_Fertility/WPP2015_FERT_F07_AGE_SPECIFIC_FERTILITY.XLS'
#cols = [2, 4, 5] + list(range(6,13))
#fage = pd.read_excel(ufa, sheetname=0, skiprows=16, parse_cols=cols, na_values=['…'])
In [35]:
ftot.head(3)[list(range(10))]
Out[35]:
In [36]:
# rename some variables
ftot = ftot.rename(columns={'Major area, region, country or area *': 'Country',
'Country code': 'Code'})
In [37]:
# drop code
f = ftot.drop(['Code'], axis=1)
# select countries
countries = ['China', 'Japan', 'Germany', 'United States of America']
f = f[f['Country'].isin(countries)]
# shape
f = f.set_index('Country').T
f = f.rename(columns={'United States of America': 'United States'})
f.tail()
Out[37]:
In [38]:
fig, ax = plt.subplots()
f.plot(ax=ax, kind='line', alpha=0.5, lw=3, figsize=(6.5, 4))
ax.set_title('Fertility rate (births per woman, lifetime)', fontsize=14, loc='left')
ax.legend(loc='best', fontsize=10, handlelength=2, labelspacing=0.15)
ax.set_ylim(ymin=0)
ax.hlines(2.1, -1, 13, linestyles='dashed')
ax.text(8.5, 2.4, 'Replacement = 2.1')
Out[38]:
In [ ]:
People often refer to interest rates moving up and down, but in fact the yields (as we call them) move up and down differently at different maturities. From 2010 to 2015, for examples, yields on bonds with maturities under 2 years were essentially fixed at zero, but yields on higher maturities moved up and down quite a bit.
One challenge with bond yields is getting good data. Bonds are typically traded over the counter, and those bonds differ in maturity, coupon, and often other features. Analysts often focus on the interest rates of pure discount "zero-coupon" bonds, which they infer from prices of coupon bonds -- which are, after all, what is generally traded. This isn't something to go into unless you have a strong interest. Suffice it to say that we have taken these yields from estimates supplied by the Fed.
This is still a work in progress, but our goal is to produce a movie of bond yields over time.
In [27]:
url = 'http://pages.stern.nyu.edu/~dbackus/Data/GSW_nominal_yields.csv'
y = pd.read_csv(url, index_col=0, parse_dates=True)
y[list(range(10))].tail()
Out[27]:
In [28]:
# compute mean yields
ybar = y.mean(axis=0)
ystd = y.std(axis=0)
maturities = list(range(1,122))
ybar.index = maturities
ystd.index = maturities
In [45]:
fig, ax = plt.subplots()
ybar.plot(ax=ax)
ystd.plot(ax=ax)
ax.set_xlabel('Maturity in Months')
ax.set_ylabel('Mean and Std Dev of Yield (Annual Percent)')
ax.set_ylim(0)
ax.legend(["Mean", "std. dev."])
Out[45]:
In [ ]:
In [ ]:
In [ ]: