This notebook is based on course notes from Lamoureux's course Math 651 at the University of Calgary, Winter 2016.
This was an exercise to try out some resourse in Python. Specifically, we want to scrape some data from the web concerning stock prices, and display in a Panda. Then do some basic data analysis on the information.
We take advantage of the fact that there is a lot of financial data freely accessible on the web, and lots of people post information about how to use it.
I am using the book Python for Data Analysis by Wes McKinney as a reference for this section.
The point of using Python for this is that a lot of people have created good code to do this.
The pandas name comes from Panel Data, an econometrics terms for multidimensional structured data sets, as well as from Python Data Analysis.
The dataframe objects that appear in pandas originated in R. But apparently thery have more functionality in Python than in R.
I will be using PYLAB as well in this section, so we can make use of NUMPY and MATPLOTLIB.
For free, historical data on commodities like Oil, you can try this site: http://www.databank.rbs.com This site will download data directly into spreadsheets for you, plot graphs of historical data, etc. Here is an example of oil prices (West Texas Intermdiate), over the last 15 years. Look how low it goes...
Yahoo supplies current stock and commodity prices. Here is an intereting site that tells you how to download loads of data into a csv file. http://www.financialwisdomforum.org/gummy-stuff/Yahoo-data.htm
Here is another site that discusses accessing various financial data sources. http://quant.stackexchange.com/questions/141/what-data-sources-are-available-online
In [17]:
# Get some basic tools
%pylab inline
from pandas import Series, DataFrame
import pandas as pd
#import pandas.io.data as web
#from pandas_datareader import data, web
#import pandas_datareader as pdr
from pandas_datareader import data as pdr
import fix_yahoo_finance
In [19]:
# Here are apple and microsoft closing prices since 2016
start = datetime.datetime(2016,1,1)
end = datetime.date.today()
data = pdr.get_data_yahoo(["SPY", "IWM"], start="2017-01-01", end="2017-04-30")
# aapl = pdr.get_data_yahoo('AAPL')
#apple = pdr.DataReader('AAPL', 'yahoo', start, end)
#aapl = pdr.get_data_yahoo('AAPL','2001-01-01')['Adj Close']
#msft = pdr.get_data_yahoo('MSFT','2001-01-01')['Adj Close']
#subplot(2,1,1)
#plot(aapl)
#subplot(2,1,2)
#plot(msft)
In [3]:
aapl
In [4]:
# Let's look at the changes in the stock prices, normalized as a percentage
aapl_rets = aapl.pct_change()
msft_rets = msft.pct_change()
subplot(2,1,1)
plot(aapl_rets)
subplot(2,1,2)
plot(msft_rets)
Out[4]:
In [5]:
# Let's look at the correlation between these two series
pd.rolling_corr(aapl_rets, msft_rets, 250).plot()
Out[5]:
Now, we can use some more sophisticated statistical tools, like least squares regression. However, I had to do some work to get Python to recognize these items. But I didn't work too hard, I just followed the error messages.
It became clear that I needed to go back to a terminal window to load in some packages. The two commands I had to type in were
'pip' is an 'python installer package' that install packages of code onto your computer (or whatever machine is running your python). The two packages 'statsmodels' and 'patsy' are assorted statistical packages. I don't know much about them, but they are easy to find on the web.
In [6]:
# We may also try a least square regression, also built in as a panda function
model = pd.ols(y=aapl_rets, x={'MSFT': msft_rets},window=256)
In [7]:
model.beta
Out[7]:
In [8]:
model.beta['MSFT'].plot()
Out[8]:
In [9]:
# Those two graphs looked similar. Let's plot them together
subplot(2,1,1)
pd.rolling_corr(aapl_rets, msft_rets, 250).plot()
title('Rolling correlations')
subplot(2,1,2)
model.beta['MSFT'].plot()
title('Least squaresn model')
Out[9]:
In [10]:
px = web.get_data_yahoo('SPY')['Adj Close']*10
px
Out[10]:
In [11]:
plot(px)
Out[11]: