In this notebook we will get started with using Python for financial computation. You first need to download and install IPython which is an entire system for interactive use of Python. Luckily, this is easily done by visiting the following URL and downloading the version applicable to your machine (Python runs on all platforms).
Visit Continuum Analytics and download Python: http://continuum.io/downloads and install it. Test that it works by using the Launcher (which will be on your desktop) to start up the IPython Notebook.
Also install the R programming language, which is a very useful tool for Machine Learning. See: http://en.wikipedia.org/wiki/Machine_learning Get R from: http://www.r-project.org/ (download and install it).
In order to allow Python to interface with R, you also need to install the "rpy2" package from: http://rpy.sourceforge.net/ While you do not need to use R, and can get by mostly with Python, it is also a useful language as there are heaps of finance packages in R. As you will see, we can run both within the IPython Notebook. (Indeed, these lecture notes have been prepared in the Notebook.)
If you want to use R in IDE mode, download RStudio: http://www.rstudio.com/
Particularly useful are linear algebra capabilities in Python and R. In Python the package numPy makes these facile. Also useful are graphical libraries, and in Python, these comes in the matplotlib package.
In [1]:
#Invoke numPy and matplotlib in one command
%pylab inline
In [2]:
#IMPORTING STOCK DATA USING PANDAS
from pandas.io.data import DataReader
from datetime import datetime
goog = DataReader("GOOG", "yahoo", datetime(2014,4,1), datetime(2015,3,31))
stkp = goog["Adj Close"]
print stkp
In [3]:
goog.head()
Out[3]:
In [4]:
goog.index
Out[4]:
In [5]:
t = goog.index
plot(t,stkp)
xlabel("Date")
ylabel("Stock Price")
Out[5]:
In [6]:
n = len(t)-1
rets = zeros(n)
for j in range(n):
rets[j] = log(stkp[j+1]/stkp[j])
plot(rets)
ylabel("Returns")
Out[6]:
In [7]:
hist(rets,25)
Out[7]:
In [8]:
goog.describe()
Out[8]:
In [9]:
import scipy.stats as ss
print("Skewness = ",ss.skew(rets))
print("Kurtosis = ",ss.kurtosis(rets))
#CHECK IF THIS IS EXCESS KURTOSIS or PLAIN KURTOSIS
x = randn(1000000)
print(ss.kurtosis(x))
In [10]:
#SENDING DATA VARIABLES TO R
%load_ext rpy2.ipython
#THIS ALLOWS US TO USE R INSIDE THE NOTEBOOK
In [11]:
#SENDS DATA FROM PYTHON TO R
%Rpush stkp
In [12]:
#PREFIX NEEDED TO CALL R INSTEAD OF PYTHON
%R plot(stkp,type="l",col="red",lwd=2)
In [13]:
#GETTING DATA BACK FROM R TO PYTHON
%R ret = diff(log(stkp))
#GET DATA BACK FROM R TO PYTHON
ret = %Rget ret
plot(ret)
%R print(summary(ret))
In [14]:
%%R
library(quantmod)
getSymbols(c("C","AAPL","CSCO","YHOO","IBM"))
In [15]:
%%R
citi = as.matrix(C$C.Adjusted)
aapl = as.matrix(AAPL$AAPL.Adjusted)
csco = as.matrix(CSCO$CSCO.Adjusted)
yhoo = as.matrix(YHOO$YHOO.Adjusted)
ibm = as.matrix(IBM$IBM.Adjusted)
In [16]:
%%R
stkdata = data.frame(cbind(citi,aapl,csco,yhoo,ibm))
rn = rownames(stkdata)
head(stkdata)
In [17]:
stkdata = %Rget stkdata
rn = %Rget rn
In [18]:
stkdata
Out[18]:
In [19]:
rn
Out[19]:
In [20]:
import pandas as pd
stk = pd.DataFrame(stkdata)
stk = stk.T
stk.head()
Out[20]:
In [21]:
stk.columns=["C","AAPL","CSCO","YHOO","IBM"]
stk.index = rn
stk.head()
Out[21]:
In [22]:
plot(stk["AAPL"])
Out[22]:
In [23]:
stk.ix['2007-01-03']
Out[23]:
In [24]:
stk.ix['2007-01-03']["AAPL"]
Out[24]:
In [25]:
stk["extra"] = 1.0
stk.head()
Out[25]:
In [26]:
sort(stk["AAPL"])
Out[26]:
In [27]:
stk.head()
Out[27]:
In [28]:
stk = stk.drop("extra",axis=1) #IF AXIS=0 (default), THEN ROW IS DROPPED
stk.head()
Out[28]:
In [29]:
stk[["AAPL","IBM"]].head()
Out[29]:
In [30]:
stk[stk["AAPL"]<11]
Out[30]:
In [32]:
stk[stk["AAPL"]<11]["IBM"]
Out[32]:
In [33]:
(stk < 50).head()
Out[33]:
In [34]:
sum(stk)
Out[34]:
In [35]:
#USING FUNCTIONS ON DATA FRAMES
f = lambda x: x.max() - x.min()
stk.apply(f)
Out[35]:
In [36]:
def g(x):
return pd.Series([x.mean(),x.std(),x.min(),x.max()], index=['mean','stdev','min','max'])
stk.apply(g)
Out[36]:
In [37]:
stk.sort_index(axis=1,ascending=False).head()
Out[37]:
In [38]:
stk.sum()
Out[38]:
In [39]:
stk.mean()
Out[39]:
In [40]:
stk.describe()
Out[40]:
In [41]:
stk.diff().head()
Out[41]:
In [42]:
stk.pct_change().head()
Out[42]:
In [43]:
stk.pct_change().mean()*252.0
Out[43]:
In [44]:
stk.pct_change().std()*sqrt(252.0)
Out[44]:
In [45]:
rets = stk.pct_change()
rets.corr()
Out[45]:
In [46]:
rets.cov()
Out[46]:
In [47]:
sqrt(diag(rets.cov())*252.0)
Out[47]:
In [48]:
rets.corrwith(rets.AAPL)
Out[48]:
In [49]:
import pandas.io.data as pid
panel = pd.Panel(dict((stock, pid.get_data_yahoo(stock,'1/1/2014','2/28/2015')) for stock in ['C','AAPL','CSCO','YHOO','IBM']))
panel
Out[49]:
In [50]:
panel = panel.swapaxes('items','minor')
panel
Out[50]:
In [51]:
panel['Adj Close'].head()
Out[51]:
In [52]:
panel.ix[:,'1/3/2014',:]
Out[52]:
In [53]:
import pandas as pd
data = pd.read_table("markowitzdata.txt")
In [54]:
data.head()
Out[54]:
In [55]:
gdata = pd.read_csv("goog.csv")
In [56]:
gdata.head()
Out[56]:
In [57]:
scatter(data["mktrf"],data["IBM"])
xlabel("Market return")
ylabel("IBM return")
grid(True)
In [58]:
from scipy import stats
y = data["IBM"]
x = data["mktrf"]
b, a, r_value, p_value, std_err = stats.linregress(x,y)
print "Intercept = ",a
print "slope (beta) = ",b
In [59]:
import pandas as pd
import pandas.io.data as web
aapl = web.DataReader('AAPL',data_source='google',start='1/1/2104',end='4/1/2015')
aapl.head()
Out[59]:
In [60]:
aapl.tail()
Out[60]:
In [61]:
aapl['cont_ret'] = log(aapl['Close']/aapl['Close'].shift(1))
aapl.head()
Out[61]:
In [62]:
aapl['Vols'] = pd.rolling_std(aapl['cont_ret'],window=5)*sqrt(252.0)
aapl.tail()
Out[62]:
In [63]:
aapl.head(10)
Out[63]:
In [64]:
aapl[['Close','Vols']].plot(subplots=True,color='blue',figsize=(8,6))
Out[64]:
DIVERSIFICATION OF A PORTFOLIO
It is useful to examine the power of using vector algebra with an application. Here we use vector and summation math to understand how diversification in stock portfolios works. Diversification occurs when we increase the number of non-perfectly correlated stocks in a portfolio, thereby reducing portfolio variance. In order to compute the variance of the portfolio we need to use the portfolio weights ${\bf w}$ and the covariance matrix of stock returns ${\bf R}$, denoted ${\bf \Sigma}$. We first write down the formula for a portfolio's return variance:
\begin{equation} Var(\boldsymbol{w'R}) = \boldsymbol{w'\Sigma w} = \sum_{i=1}^n \boldsymbol{w_i^2 \sigma_i^2} + \sum_{i=1}^n \sum_{j=1,i \neq j}^n \boldsymbol{w_i w_j \sigma_{ij}} \end{equation}Readers are strongly encouraged to implement this by hand for $n=2$ to convince themselves that the vector form of the expression for variance $\boldsymbol{w'\Sigma w}$ is the same thing as the long form on the right-hand side of the equation above. If returns are independent, then the formula collapses to:
\begin{equation} Var(\bf{w'R}) = \bf{w'\Sigma w} = \sum_{i=1}^n \boldsymbol{w_i^2 \sigma_i^2} \end{equation}If returns are dependent, and equal amounts are invested in each asset ($w_i=1/n,\;\;\forall i$):
\begin{eqnarray*} Var(\bf{w'R}) &=& \frac{1}{n}\sum_{i=1}^n \frac{\sigma_i^2}{n} + \frac{n-1}{n}\sum_{i=1}^n \sum_{j=1,i \neq j}^n \frac{\sigma_{ij}}{n(n-1)}\\ &=& \frac{1}{n} \bar{\sigma_i}^2 + \frac{n-1}{n} \bar{\sigma_{ij}}\\ &=& \frac{1}{n} \bar{\sigma_i}^2 + \left(1 - \frac{1}{n} \right) \bar{\sigma_{ij}} \end{eqnarray*}The first term is the average variance, denoted $\bar{\sigma_1}^2$ divided by $n$, and the second is the average covariance, denoted $\bar{\sigma_{ij}}$ multiplied by factor $(n-1)/n$. As $n \rightarrow \infty$,
\begin{equation} Var({\bf w'R}) = \bar{\sigma_{ij}} \end{equation}This produces the remarkable result that in a well diversified portfolio, the variances of each stock's return does not matter at all for portfolio risk! Further the risk of the portfolio, i.e., its variance, is nothing but the average of off-diagonal terms in the covariance matrix.
In [ ]:
sd=0.20; cv=0.01; m=100
n = range(1,m+1)
sd_p = zeros(m)
for j in n:
cv_mat = matrix(ones((j,j))*cv)
fill_diagonal(cv_mat,sd**2)
w = matrix(ones(j)*(1.0/j)).T
sd_p[j-1] = sqrt((w.T).dot(cv_mat).dot(w))
In [ ]:
plot(n,sd_p)
xlabel('#stocks')
ylabel('stddev of portfolio')
grid()
The geometric mean is a good indicator of past performance, especially when we are interested in holding period returns. But if we are interested in expected future returns, the arithmetic mean is the relevant statistic.
Suppose a stock will rise by 30% or fall by 20% with equal probability. If it did one each in consecutive years then the geometric mean return is:
In [ ]:
g_ret = ((1+0.30)*(1-0.20))**0.5-1
print "Geometric mean return = ", g_ret
In [ ]:
a_ret = 0.5*(0.30+(-0.20))
print "Arithmetic mean return per year = ",a_ret
Which one is more realistic in predicting future expected returns over the next two years? Note that there are 4 cases to consider for outcomes, all with equal probability $1/4$.
In [ ]:
ret = zeros(4)
ret[0] = (1+0.3)*(1+0.3)
ret[1] = (1+0.3)*(1-0.2)
ret[2] = (1-0.2)*(1+0.3)
ret[3] = (1-0.2)*(1-0.2)
two_year_return = 0.25*sum(ret)
print "Expected two year return = ", two_year_return
print "Expected two year return (annualized) = ", two_year_return**0.5
In [ ]: