Numpy introduction


In [1]:
%pylab inline


Populating the interactive namespace from numpy and matplotlib

Numpy is automatically imported to the top level namespace when you use %pylab

Numpy's main workhorse are is the "array" data structure. You can think of arrays as vectors or matrices.


In [2]:
test = array(range(10))
print(test)


[0 1 2 3 4 5 6 7 8 9]

Don't be confused by the fact that they output as python lists! It's more like Matlab. The main differences we care about are:

  • Arrays do mathematical operations in element-wise fashion (by default)
  • They're much faster (for computation)!

In [4]:
# example of elementwise operations

print(test + 1)
print(test * 2)
print(sqrt(test))


[ 1  2  3  4  5  6  7  8  9 10]
[ 0  2  4  6  8 10 12 14 16 18]
[ 0.          1.          1.41421356  1.73205081  2.          2.23606798
  2.44948974  2.64575131  2.82842712  3.        ]

That's most of what we want to cover! The main thing to keep in mind is underneath the covers, Numpy is doing most of the hard work.

Matrix calculations example: Calculating an "optimum" stock portfolio

But let's do a somewhat common example to showcase the power of numpy (and scipy)


In [5]:
"""Load all the data!"""
import pandas as pd
import pandas.io.data as web
from datetime import datetime

tickers = ["AAPL", "GOOG", "MSFT"]

stocks = pd.DataFrame({'AAPL': web.DataReader('AAPL',  "yahoo", datetime(1970,1,1),
                                              datetime(2013,10,1))['Adj Close'].resample('M', how='last')})
print(stocks)

# load everything into one nice dataframe
for stock in tickers:
    stocks[stock] = np.NaN
    temp = web.DataReader(stock,  "yahoo", datetime(1970,1,1),
                          datetime(2013,10,1))['Adj Close'].resample('M', how='last')
    stocks[stock] = temp

# put things into percent return
for stock in tickers:
    stocks[stock] = stocks[stock]/stocks[stock].shift(1) - 1.
    
# visualize!
stocks.cumsum().plot()
show()


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 350 entries, 1984-09-30 00:00:00 to 2013-10-31 00:00:00
Freq: M
Data columns (total 1 columns):
AAPL    350  non-null values
dtypes: float64(1)

In [6]:
# load things into numpy data types
V = matrix(stocks.cov().values)  # covariance matrix
R = matrix(stocks.mean().values).T  # expected returns

ones_col = np.ones((3,1))
ones_row = ones_col.T

print(V)
print()
print(R)


[[ 0.01798821  0.00597144  0.0060241 ]
 [ 0.00597144  0.01112027  0.00299114]
 [ 0.0060241   0.00299114  0.01150264]]
()
[[ 0.02412728]
 [ 0.02498674]
 [ 0.02419786]]

In [8]:
# And now let's calculte the optimum portfolio!
# The optimum portfolio is just the inversion of the covariance matrix times the expected returns
# and then "normalized" so that the weights sum to 1

optimum_weights = multiply(((ones_col.T * V.I * R).I), (V.I * R) )

print(optimum_weights)
print(tickers)


[[ 0.07735049]
 [ 0.48577473]
 [ 0.43687478]]
['AAPL', 'GOOG', 'MSFT']

Scipy: another foundational package

Another foundational package is scipy. If you want to work with simulations and probabilities, scipy is a great place to start. Let's show you how to pull some random variables


In [12]:
from scipy.stats import norm  # import a normal distribution
from scipy.stats import chi2  # import a chi square distribution

normal_rvs = norm(0, 1).rvs(1000)
chi_rvs = chi2(10).rvs(1000)

hist(chi_rvs, bins=20)
show()


Out[12]:
(array([  17.,   48.,   85.,  144.,  145.,  138.,  106.,   82.,   72.,
         56.,   40.,   30.,   15.,    5.,    6.,    1.,    5.,    3.,
          1.,    1.]),
 array([  1.36129869,   2.77380021,   4.18630174,   5.59880326,
         7.01130478,   8.4238063 ,   9.83630782,  11.24880934,
        12.66131086,  14.07381238,  15.4863139 ,  16.89881542,
        18.31131694,  19.72381846,  21.13631999,  22.54882151,
        23.96132303,  25.37382455,  26.78632607,  28.19882759,  29.61132911]),
 <a list of 20 Patch objects>)