Pandas



In [ ]:

    
#installing pandas libraries
!pip install pandas-datareader
!pip install --upgrade html5lib==1.0b8

#There is a bug in the latest version of html5lib so install an earlier version
#Restart kernel after installing html5lib

Imports



In [2]:

    
import pandas as pd #pandas library
from pandas_datareader import data #data readers (google, html, etc.)
#The following line ensures that graphs are rendered in the notebook
%matplotlib inline 
import numpy as np
import matplotlib.pyplot as plt #Plotting library
import datetime as dt #datetime for timeseries support

The structure of a dataframe



In [3]:

    
pd.DataFrame([[1,2,3],[1,2,3]],columns=['A','B','C'])

Accessing columns and rows



In [40]:

    
df = pd.DataFrame([['r1','00','01','02'],['r2','10','11','12'],['r3','20','21','22']],columns=['row_label','A','B','C'])
print(id(df))
df.set_index('row_label',inplace=True)
print(id(df))
df



In [39]:

    
data = {'nationality': ['UK', 'China', 'US', 'UK', 'Japan', 'China', 'UK', 'UK', 'Japan', 'US'],
        'age': [25, 30, 15, np.nan, 25, 22, np.nan,45 ,18, 33],
        'type': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'diabetes': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df=pd.DataFrame(data=data,index=labels)
#print(df[df['age'].between(20,30)])
#print(df.groupby('nationality').mean()['age'])
#print(df.sort_values(by=['age','type'],ascending=[False,True]))
df['nationality'] = df['nationality'].replace('US','United States')
print(df)









    



    age diabetes    nationality  type
a  25.0      yes             UK     1
b  30.0      yes          China     3
c  15.0       no  United States     2
d   NaN      yes             UK     3
e  25.0       no          Japan     2
f  22.0       no          China     3
g   NaN       no             UK     1
h  45.0      yes             UK     1
i  18.0       no          Japan     2
j  33.0       no  United States     1



In [41]:

    
df.ix[1]









    Out[41]:





A    10
B    11
C    12
Name: r2, dtype: object

Getting column data



In [42]:

    
df['B']









    Out[42]:





row_label
r1    01
r2    11
r3    21
Name: B, dtype: object

Getting row data



In [43]:

    
df.loc['r1']









    Out[43]:





A    00
B    01
C    02
Name: r1, dtype: object

Getting a row by row number



In [44]:

    
df.iloc[0]









    Out[44]:





A    00
B    01
C    02
Name: r1, dtype: object

Getting multiple columns



In [45]:

    
df[['B','A']] #Note that the column identifiers are in a list

Getting a specific cell



In [46]:

    
df.loc['r2','B']









    Out[46]:





'11'



In [47]:

    
df.loc['r2']['A']









    Out[47]:





'10'

Slicing



In [49]:

    
print(df)
print(df.loc['r1':'r2'])









    



            A   B   C
row_label            
r1         00  01  02
r2         10  11  12
r3         20  21  22
            A   B   C
row_label            
r1         00  01  02
r2         10  11  12



In [50]:

    
df.loc['r1':'r2','B':'C']

Pandas datareader

Access data from html tables on any web page

Get data from google finance

Get data from the federal reserve

HTML Tables

Pandas datareader can read a table in an html page into a dataframe

the read_html function returns a list of all dataframes with one dataframe for each html table on the page

Example: Read the tables on the google finance page



In [60]:

    
#df_list = pd.read_html('http://www.bloomberg.com/markets/currencies/major')
df_list = pd.read_html('http://www.waihuipaijia.cn/'
                      , encoding='utf-8')
print(len(df_list))

The page contains only one table so the read_html function returns a list of one element



In [61]:

    
df = df_list[0]
print(df)









    



          0                                         1
0  中行外汇牌价查询  中国银行外汇牌价更新：2017-08-30 06:37:40 外汇交易平台排行榜

Note that the read_html function has automatically detected the header columns

If an index is necessary, we need to explicitly specify it



In [ ]:

    
df.set_index('Currency',inplace=True)
print(df)

Now we can use .loc to extract specific currency rates



In [ ]:

    
df.loc['EUR-CHF','Value']

Working with views and copies

Chained indexing creates a copy and changes to the copy won't be reflected in the original dataframe



In [ ]:

    
eur_usd = df.loc['EUR-USD']['Change'] #This is chained indexing
df.loc['EUR-USD']['Change'] = 1.0 #Here we are changing a value in a copy of the dataframe
print(eur_usd)
print(df.loc['EUR-USD']['Change']) #Neither eur_usd, nor the dataframe are changed



In [ ]:

    
eur_usd = df.loc['EUR-USD','Change'] #eur_usd points to the value inside the dataframe
df.loc['EUR-USD','Change'] = 1.0 #Change the value in the view 
print(eur_usd) #eur_usd is changed (because it points to the view)
print(df.loc['EUR-USD']['Change']) #The dataframe has been correctly updated

Getting historical stock prices from Google financs

Usage: DataReader(ticker,source,startdate,enddate)
Unfortunately, the Yahoo finance datareader has stopped working because of a change to Yahoo's website



In [62]:

    
from pandas_datareader import data
import datetime as dt
start=dt.datetime(2017, 1, 1)
end=dt.datetime.today()


print(start,end)


df = data.DataReader('IBM', 'google', start, end)









    



2017-01-01 00:00:00 2017-08-30 14:58:14.819387



In [63]:

    
df









    Out[63]:






  
    
      
      Open
      High
      Low
      Close
      Volume
    
    
      Date
      
      
      
      
      
    
  
  
    
      2017-01-03
      167.00
      167.87
      166.01
      167.19
      2934299
    
    
      2017-01-04
      167.77
      169.87
      167.36
      169.26
      3381432
    
    
      2017-01-05
      169.25
      169.39
      167.26
      168.70
      2682301
    
    
      2017-01-06
      168.69
      169.92
      167.52
      169.53
      2945536
    
    
      2017-01-09
      169.47
      169.80
      167.62
      167.65
      3189891
    
    
      2017-01-10
      167.98
      168.09
      165.34
      165.52
      4118694
    
    
      2017-01-11
      166.05
      167.76
      165.60
      167.75
      3599464
    
    
      2017-01-12
      167.77
      168.01
      165.56
      167.95
      2927973
    
    
      2017-01-13
      167.97
      168.48
      166.88
      167.34
      2875433
    
    
      2017-01-17
      166.69
      168.18
      166.12
      167.89
      3315655
    
    
      2017-01-18
      167.45
      168.59
      166.69
      166.80
      4007779
    
    
      2017-01-19
      166.96
      167.45
      165.80
      166.81
      6963386
    
    
      2017-01-20
      167.81
      170.64
      166.00
      170.55
      12690029
    
    
      2017-01-23
      170.08
      171.25
      170.01
      171.03
      5478224
    
    
      2017-01-24
      171.36
      176.00
      171.16
      175.90
      6952369
    
    
      2017-01-25
      176.26
      179.25
      176.13
      178.29
      5922462
    
    
      2017-01-26
      178.06
      178.88
      177.51
      178.66
      3063924
    
    
      2017-01-27
      178.47
      179.20
      177.30
      177.30
      3482311
    
    
      2017-01-30
      176.98
      177.07
      174.58
      175.80
      4094755
    
    
      2017-01-31
      175.05
      175.58
      173.61
      174.52
      4138962
    
    
      2017-02-01
      175.00
      175.70
      172.89
      174.29
      2795745
    
    
      2017-02-02
      174.23
      174.97
      173.55
      174.58
      2620526
    
    
      2017-02-03
      175.00
      176.34
      174.89
      175.82
      3309288
    
    
      2017-02-06
      175.31
      175.98
      174.38
      175.86
      2650785
    
    
      2017-02-07
      176.00
      178.62
      175.90
      178.46
      3873455
    
    
      2017-02-08
      177.50
      177.50
      175.88
      176.17
      2913126
    
    
      2017-02-09
      176.17
      177.80
      175.91
      177.21
      3090710
    
    
      2017-02-10
      177.37
      178.87
      176.76
      178.68
      2930710
    
    
      2017-02-13
      179.24
      179.90
      178.84
      179.36
      3093372
    
    
      2017-02-14
      178.57
      180.13
      178.35
      180.13
      2776528
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2017-07-19
      150.02
      150.25
      146.71
      147.53
      14293610
    
    
      2017-07-20
      147.53
      148.83
      147.03
      147.66
      7212233
    
    
      2017-07-21
      147.59
      147.87
      146.51
      147.08
      6509459
    
    
      2017-07-24
      147.00
      147.04
      145.80
      145.99
      3877593
    
    
      2017-07-25
      146.57
      147.49
      146.03
      146.19
      4208895
    
    
      2017-07-26
      146.27
      146.46
      144.47
      145.36
      3932361
    
    
      2017-07-27
      145.00
      145.40
      143.64
      145.07
      6430841
    
    
      2017-07-28
      144.81
      145.04
      143.84
      144.29
      3054834
    
    
      2017-07-31
      NaN
      144.93
      NaN
      144.67
      4355718
    
    
      2017-08-01
      145.00
      145.67
      144.72
      145.30
      3219477
    
    
      2017-08-02
      145.12
      145.29
      144.22
      144.45
      3540949
    
    
      2017-08-03
      144.43
      145.34
      144.43
      144.94
      3035393
    
    
      2017-08-04
      145.00
      145.39
      144.40
      145.16
      2549400
    
    
      2017-08-07
      145.00
      145.09
      142.75
      143.47
      4618284
    
    
      2017-08-08
      142.00
      142.76
      142.00
      142.11
      3298718
    
    
      2017-08-09
      141.75
      142.03
      141.19
      141.77
      2925770
    
    
      2017-08-10
      141.64
      142.42
      141.23
      141.84
      4525378
    
    
      2017-08-11
      142.45
      142.59
      141.52
      141.84
      3004360
    
    
      2017-08-14
      142.00
      142.76
      141.94
      142.32
      2248607
    
    
      2017-08-15
      142.55
      143.34
      140.58
      142.07
      4681545
    
    
      2017-08-16
      141.79
      142.91
      141.61
      142.50
      3287821
    
    
      2017-08-17
      142.52
      142.89
      140.70
      140.70
      4422087
    
    
      2017-08-18
      140.87
      141.15
      139.58
      139.70
      4152714
    
    
      2017-08-21
      139.59
      140.42
      139.13
      140.33
      3190691
    
    
      2017-08-22
      140.64
      141.15
      140.38
      141.01
      2544446
    
    
      2017-08-23
      140.70
      143.05
      140.68
      142.14
      4048317
    
    
      2017-08-24
      142.50
      143.68
      142.25
      142.94
      3592183
    
    
      2017-08-25
      143.30
      144.19
      143.25
      143.74
      2872893
    
    
      2017-08-28
      144.11
      144.56
      142.39
      142.51
      5134463
    
    
      2017-08-29
      142.10
      143.42
      141.90
      143.14
      3009520
    
  

166 rows × 5 columns

Datareader documentation

http://pandas-datareader.readthedocs.io/en/latest/

Working with a timeseries data frame

The data is organized by time with the index serving as the timeline

Creating new columns

Add a column to a dataframe

Base the elements of the column on some combination of data in the existing columns

Example: Number of Days that the stock closed higher than it opened

We'll create a new column with the header "UP"

And use np.where to decide what to put in the column



In [64]:

    
df['UP']=np.where(df['Close']>df['Open'],1,0)
df









    Out[64]:






  
    
      
      Open
      High
      Low
      Close
      Volume
      UP
    
    
      Date
      
      
      
      
      
      
    
  
  
    
      2017-01-03
      167.00
      167.87
      166.01
      167.19
      2934299
      1
    
    
      2017-01-04
      167.77
      169.87
      167.36
      169.26
      3381432
      1
    
    
      2017-01-05
      169.25
      169.39
      167.26
      168.70
      2682301
      0
    
    
      2017-01-06
      168.69
      169.92
      167.52
      169.53
      2945536
      1
    
    
      2017-01-09
      169.47
      169.80
      167.62
      167.65
      3189891
      0
    
    
      2017-01-10
      167.98
      168.09
      165.34
      165.52
      4118694
      0
    
    
      2017-01-11
      166.05
      167.76
      165.60
      167.75
      3599464
      1
    
    
      2017-01-12
      167.77
      168.01
      165.56
      167.95
      2927973
      1
    
    
      2017-01-13
      167.97
      168.48
      166.88
      167.34
      2875433
      0
    
    
      2017-01-17
      166.69
      168.18
      166.12
      167.89
      3315655
      1
    
    
      2017-01-18
      167.45
      168.59
      166.69
      166.80
      4007779
      0
    
    
      2017-01-19
      166.96
      167.45
      165.80
      166.81
      6963386
      0
    
    
      2017-01-20
      167.81
      170.64
      166.00
      170.55
      12690029
      1
    
    
      2017-01-23
      170.08
      171.25
      170.01
      171.03
      5478224
      1
    
    
      2017-01-24
      171.36
      176.00
      171.16
      175.90
      6952369
      1
    
    
      2017-01-25
      176.26
      179.25
      176.13
      178.29
      5922462
      1
    
    
      2017-01-26
      178.06
      178.88
      177.51
      178.66
      3063924
      1
    
    
      2017-01-27
      178.47
      179.20
      177.30
      177.30
      3482311
      0
    
    
      2017-01-30
      176.98
      177.07
      174.58
      175.80
      4094755
      0
    
    
      2017-01-31
      175.05
      175.58
      173.61
      174.52
      4138962
      0
    
    
      2017-02-01
      175.00
      175.70
      172.89
      174.29
      2795745
      0
    
    
      2017-02-02
      174.23
      174.97
      173.55
      174.58
      2620526
      1
    
    
      2017-02-03
      175.00
      176.34
      174.89
      175.82
      3309288
      1
    
    
      2017-02-06
      175.31
      175.98
      174.38
      175.86
      2650785
      1
    
    
      2017-02-07
      176.00
      178.62
      175.90
      178.46
      3873455
      1
    
    
      2017-02-08
      177.50
      177.50
      175.88
      176.17
      2913126
      0
    
    
      2017-02-09
      176.17
      177.80
      175.91
      177.21
      3090710
      1
    
    
      2017-02-10
      177.37
      178.87
      176.76
      178.68
      2930710
      1
    
    
      2017-02-13
      179.24
      179.90
      178.84
      179.36
      3093372
      1
    
    
      2017-02-14
      178.57
      180.13
      178.35
      180.13
      2776528
      1
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2017-07-19
      150.02
      150.25
      146.71
      147.53
      14293610
      0
    
    
      2017-07-20
      147.53
      148.83
      147.03
      147.66
      7212233
      1
    
    
      2017-07-21
      147.59
      147.87
      146.51
      147.08
      6509459
      0
    
    
      2017-07-24
      147.00
      147.04
      145.80
      145.99
      3877593
      0
    
    
      2017-07-25
      146.57
      147.49
      146.03
      146.19
      4208895
      0
    
    
      2017-07-26
      146.27
      146.46
      144.47
      145.36
      3932361
      0
    
    
      2017-07-27
      145.00
      145.40
      143.64
      145.07
      6430841
      1
    
    
      2017-07-28
      144.81
      145.04
      143.84
      144.29
      3054834
      0
    
    
      2017-07-31
      NaN
      144.93
      NaN
      144.67
      4355718
      0
    
    
      2017-08-01
      145.00
      145.67
      144.72
      145.30
      3219477
      1
    
    
      2017-08-02
      145.12
      145.29
      144.22
      144.45
      3540949
      0
    
    
      2017-08-03
      144.43
      145.34
      144.43
      144.94
      3035393
      1
    
    
      2017-08-04
      145.00
      145.39
      144.40
      145.16
      2549400
      1
    
    
      2017-08-07
      145.00
      145.09
      142.75
      143.47
      4618284
      0
    
    
      2017-08-08
      142.00
      142.76
      142.00
      142.11
      3298718
      1
    
    
      2017-08-09
      141.75
      142.03
      141.19
      141.77
      2925770
      1
    
    
      2017-08-10
      141.64
      142.42
      141.23
      141.84
      4525378
      1
    
    
      2017-08-11
      142.45
      142.59
      141.52
      141.84
      3004360
      0
    
    
      2017-08-14
      142.00
      142.76
      141.94
      142.32
      2248607
      1
    
    
      2017-08-15
      142.55
      143.34
      140.58
      142.07
      4681545
      0
    
    
      2017-08-16
      141.79
      142.91
      141.61
      142.50
      3287821
      1
    
    
      2017-08-17
      142.52
      142.89
      140.70
      140.70
      4422087
      0
    
    
      2017-08-18
      140.87
      141.15
      139.58
      139.70
      4152714
      0
    
    
      2017-08-21
      139.59
      140.42
      139.13
      140.33
      3190691
      1
    
    
      2017-08-22
      140.64
      141.15
      140.38
      141.01
      2544446
      1
    
    
      2017-08-23
      140.70
      143.05
      140.68
      142.14
      4048317
      1
    
    
      2017-08-24
      142.50
      143.68
      142.25
      142.94
      3592183
      1
    
    
      2017-08-25
      143.30
      144.19
      143.25
      143.74
      2872893
      1
    
    
      2017-08-28
      144.11
      144.56
      142.39
      142.51
      5134463
      0
    
    
      2017-08-29
      142.10
      143.42
      141.90
      143.14
      3009520
      1
    
  

166 rows × 6 columns

Get summary statistics

The "describe" function returns a dataframe containing summary stats for all numerical columns

Columns containing non-numerical data are ignored



In [65]:

    
df.describe()









    Out[65]:






  
    
      
      Open
      High
      Low
      Close
      Volume
      UP
    
  
  
    
      count
      165.000000
      166.000000
      165.000000
      166.000000
      1.660000e+02
      166.000000
    
    
      mean
      161.618667
      162.346084
      160.736848
      161.486566
      4.112198e+06
      0.487952
    
    
      std
      12.905701
      13.028233
      12.835382
      12.995748
      2.090827e+06
      0.501367
    
    
      min
      139.590000
      140.420000
      139.130000
      139.700000
      1.825048e+06
      0.000000
    
    
      25%
      152.030000
      152.875000
      151.600000
      152.102500
      3.057106e+06
      0.000000
    
    
      50%
      159.440000
      159.470000
      158.640000
      158.945000
      3.547950e+06
      0.000000
    
    
      75%
      174.300000
      175.390000
      173.460000
      174.507500
      4.406930e+06
      1.000000
    
    
      max
      182.000000
      182.790000
      180.920000
      181.950000
      1.928428e+07
      1.000000

Calculate the percentage of days that the stock has closed higher than its open



In [66]:

    
df['UP'].sum()/df['UP'].count()









    Out[66]:





0.48795180722891568

Calculate percent changes

The function pct_change computes a percent change between successive rows (times in timeseries data)

Defaults to a single time delta

With an argument, the time delta can be changed



In [67]:

    
df['Close'].pct_change() #One timeperiod percent change









    Out[67]:





Date
2017-01-03         NaN
2017-01-04    0.012381
2017-01-05   -0.003309
2017-01-06    0.004920
2017-01-09   -0.011089
2017-01-10   -0.012705
2017-01-11    0.013473
2017-01-12    0.001192
2017-01-13   -0.003632
2017-01-17    0.003287
2017-01-18   -0.006492
2017-01-19    0.000060
2017-01-20    0.022421
2017-01-23    0.002814
2017-01-24    0.028475
2017-01-25    0.013587
2017-01-26    0.002075
2017-01-27   -0.007612
2017-01-30   -0.008460
2017-01-31   -0.007281
2017-02-01   -0.001318
2017-02-02    0.001664
2017-02-03    0.007103
2017-02-06    0.000228
2017-02-07    0.014784
2017-02-08   -0.012832
2017-02-09    0.005903
2017-02-10    0.008295
2017-02-13    0.003806
2017-02-14    0.004293
                ...   
2017-07-19   -0.042013
2017-07-20    0.000881
2017-07-21   -0.003928
2017-07-24   -0.007411
2017-07-25    0.001370
2017-07-26   -0.005678
2017-07-27   -0.001995
2017-07-28   -0.005377
2017-07-31    0.002634
2017-08-01    0.004355
2017-08-02   -0.005850
2017-08-03    0.003392
2017-08-04    0.001518
2017-08-07   -0.011642
2017-08-08   -0.009479
2017-08-09   -0.002393
2017-08-10    0.000494
2017-08-11    0.000000
2017-08-14    0.003384
2017-08-15   -0.001757
2017-08-16    0.003027
2017-08-17   -0.012632
2017-08-18   -0.007107
2017-08-21    0.004510
2017-08-22    0.004846
2017-08-23    0.008014
2017-08-24    0.005628
2017-08-25    0.005597
2017-08-28   -0.008557
2017-08-29    0.004421
Name: Close, dtype: float64



In [71]:

    
n=2
df['Close'].pct_change(n) #n timeperiods percent change









    Out[71]:





Date
2017-01-03         NaN
2017-01-04         NaN
2017-01-05    0.009032
2017-01-06    0.001595
2017-01-09   -0.006224
2017-01-10   -0.023654
2017-01-11    0.000596
2017-01-12    0.014681
2017-01-13   -0.002444
2017-01-17   -0.000357
2017-01-18   -0.003227
2017-01-19   -0.006433
2017-01-20    0.022482
2017-01-23    0.025298
2017-01-24    0.031369
2017-01-25    0.042449
2017-01-26    0.015691
2017-01-27   -0.005553
2017-01-30   -0.016008
2017-01-31   -0.015680
2017-02-01   -0.008589
2017-02-02    0.000344
2017-02-03    0.008778
2017-02-06    0.007332
2017-02-07    0.015015
2017-02-08    0.001763
2017-02-09   -0.007004
2017-02-10    0.014248
2017-02-13    0.012132
2017-02-14    0.008115
                ...   
2017-07-19   -0.035815
2017-07-20   -0.041169
2017-07-21   -0.003050
2017-07-24   -0.011310
2017-07-25   -0.006051
2017-07-26   -0.004315
2017-07-27   -0.007661
2017-07-28   -0.007361
2017-07-31   -0.002757
2017-08-01    0.007000
2017-08-02   -0.001521
2017-08-03   -0.002478
2017-08-04    0.004915
2017-08-07   -0.010142
2017-08-08   -0.021011
2017-08-09   -0.011849
2017-08-10   -0.001900
2017-08-11    0.000494
2017-08-14    0.003384
2017-08-15    0.001622
2017-08-16    0.001265
2017-08-17   -0.009643
2017-08-18   -0.019649
2017-08-21   -0.002630
2017-08-22    0.009377
2017-08-23    0.012898
2017-08-24    0.013687
2017-08-25    0.011257
2017-08-28   -0.003008
2017-08-29   -0.004174
Name: Close, dtype: float64

NaN support

Pandas functions can ignore NaNs



In [72]:

    
n=13
df['Close'].pct_change(n).mean()









    Out[72]:





-0.013587425292889614

Rolling windows

"rolling" function extracts rolling windows

For example, the 21 period rolling window of the 13 period percent change



In [73]:

    
df['Close'].pct_change(n).rolling(21)









    Out[73]:





Rolling [window=21,center=False,axis=0]

Calculate something on the rolling windows

Example: mean (the 21 day moving average of the 13 day percent change)



In [74]:

    
n=13
df['Close'].pct_change(n).rolling(21).mean()









    Out[74]:





Date
2017-01-03         NaN
2017-01-04         NaN
2017-01-05         NaN
2017-01-06         NaN
2017-01-09         NaN
2017-01-10         NaN
2017-01-11         NaN
2017-01-12         NaN
2017-01-13         NaN
2017-01-17         NaN
2017-01-18         NaN
2017-01-19         NaN
2017-01-20         NaN
2017-01-23         NaN
2017-01-24         NaN
2017-01-25         NaN
2017-01-26         NaN
2017-01-27         NaN
2017-01-30         NaN
2017-01-31         NaN
2017-02-01         NaN
2017-02-02         NaN
2017-02-03         NaN
2017-02-06         NaN
2017-02-07         NaN
2017-02-08         NaN
2017-02-09         NaN
2017-02-10         NaN
2017-02-13         NaN
2017-02-14         NaN
                ...   
2017-07-19   -0.000918
2017-07-20   -0.003539
2017-07-21   -0.006686
2017-07-24   -0.009687
2017-07-25   -0.012159
2017-07-26   -0.015860
2017-07-27   -0.019281
2017-07-28   -0.022425
2017-07-31   -0.024900
2017-08-01   -0.027353
2017-08-02   -0.030923
2017-08-03   -0.033265
2017-08-04   -0.035073
2017-08-07   -0.035799
2017-08-08   -0.037118
2017-08-09   -0.038652
2017-08-10   -0.039790
2017-08-11   -0.041058
2017-08-14   -0.041750
2017-08-15   -0.042200
2017-08-16   -0.042386
2017-08-17   -0.041653
2017-08-18   -0.041579
2017-08-21   -0.040335
2017-08-22   -0.039247
2017-08-23   -0.038309
2017-08-24   -0.036125
2017-08-25   -0.032987
2017-08-28   -0.029972
2017-08-29   -0.026738
Name: Close, dtype: float64

Calculate several moving averages and graph them



In [75]:

    
ma_8 = df['Close'].pct_change(n).rolling(window=8).mean()
ma_13= df['Close'].pct_change(n).rolling(window=13).mean()
ma_21= df['Close'].pct_change(n).rolling(window=21).mean()
ma_34= df['Close'].pct_change(n).rolling(window=34).mean()
ma_55= df['Close'].pct_change(n).rolling(window=55).mean()



In [76]:

    
ma_8.plot()
ma_34.plot()









    Out[76]:





<matplotlib.axes._subplots.AxesSubplot at 0x113819898>

Linear regression with pandas

Example: TAN is the ticker for a solar ETF. FSLR, RGSE, and SCTY are tickers of companies that build or lease solar panels. Each has a different business model. We'll use pandas to study the risk reward tradeoff between the 4 investments and also see how correlated they are



In [77]:

    
import datetime
import pandas_datareader as data
start = datetime.datetime(2015,7,1)
end = datetime.datetime(2016,6,1)
solar_df = data.DataReader(['FSLR', 'TAN','RGSE','SCTY'],'google', start=start,end=end)['Close']



In [78]:

    
solar_df









    Out[78]:






  
    
      
      FSLR
      RGSE
      SCTY
      TAN
    
    
      Date
      
      
      
      
    
  
  
    
      2015-07-01
      46.04
      1128.00
      52.40
      38.84
    
    
      2015-07-02
      45.17
      1200.00
      52.27
      38.55
    
    
      2015-07-06
      44.19
      1008.00
      51.75
      36.37
    
    
      2015-07-07
      45.12
      984.00
      53.21
      36.10
    
    
      2015-07-08
      43.27
      852.00
      51.48
      33.67
    
    
      2015-07-09
      43.65
      876.00
      51.98
      35.23
    
    
      2015-07-10
      44.03
      942.00
      53.00
      36.49
    
    
      2015-07-13
      46.01
      978.00
      53.39
      37.37
    
    
      2015-07-14
      45.81
      906.00
      54.26
      37.86
    
    
      2015-07-15
      44.49
      888.00
      52.54
      37.06
    
    
      2015-07-16
      44.76
      894.00
      52.80
      37.51
    
    
      2015-07-17
      43.95
      900.00
      52.15
      37.48
    
    
      2015-07-20
      43.28
      852.00
      55.77
      37.66
    
    
      2015-07-21
      43.48
      894.00
      56.20
      37.06
    
    
      2015-07-22
      42.50
      924.00
      55.36
      36.62
    
    
      2015-07-23
      42.74
      900.00
      53.67
      35.88
    
    
      2015-07-24
      42.97
      942.00
      54.62
      35.47
    
    
      2015-07-27
      42.27
      966.00
      54.69
      34.58
    
    
      2015-07-28
      43.38
      990.00
      56.03
      35.27
    
    
      2015-07-29
      45.54
      990.00
      58.05
      36.24
    
    
      2015-07-30
      45.31
      996.00
      59.58
      36.05
    
    
      2015-07-31
      44.30
      1038.00
      58.00
      35.32
    
    
      2015-08-03
      43.69
      1116.00
      57.92
      34.22
    
    
      2015-08-04
      44.50
      1272.00
      59.00
      33.93
    
    
      2015-08-05
      51.92
      1218.00
      60.17
      35.59
    
    
      2015-08-06
      51.20
      1254.00
      55.34
      33.91
    
    
      2015-08-07
      52.08
      1188.00
      52.25
      33.29
    
    
      2015-08-10
      53.48
      1068.00
      51.17
      34.16
    
    
      2015-08-11
      51.44
      846.00
      48.79
      32.78
    
    
      2015-08-12
      51.88
      839.04
      50.86
      33.10
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      2016-04-20
      61.72
      324.00
      32.65
      24.16
    
    
      2016-04-21
      61.94
      300.00
      33.87
      24.21
    
    
      2016-04-22
      61.54
      312.00
      33.34
      24.33
    
    
      2016-04-25
      60.39
      306.60
      31.18
      23.94
    
    
      2016-04-26
      61.24
      317.40
      33.00
      24.33
    
    
      2016-04-27
      62.03
      306.00
      33.31
      24.79
    
    
      2016-04-28
      57.02
      306.06
      32.24
      24.18
    
    
      2016-04-29
      55.84
      303.00
      30.32
      23.82
    
    
      2016-05-02
      54.85
      300.06
      29.55
      23.53
    
    
      2016-05-03
      52.85
      271.20
      26.45
      22.74
    
    
      2016-05-04
      51.78
      276.00
      23.65
      22.47
    
    
      2016-05-05
      51.96
      279.00
      21.67
      22.09
    
    
      2016-05-06
      51.80
      279.00
      21.84
      22.13
    
    
      2016-05-09
      50.60
      277.08
      22.51
      21.64
    
    
      2016-05-10
      51.27
      276.00
      17.82
      21.40
    
    
      2016-05-11
      50.74
      294.00
      18.69
      21.62
    
    
      2016-05-12
      49.09
      270.00
      18.92
      21.17
    
    
      2016-05-13
      48.56
      190.20
      19.60
      20.98
    
    
      2016-05-16
      49.07
      210.00
      20.06
      21.28
    
    
      2016-05-17
      48.51
      216.00
      20.50
      21.11
    
    
      2016-05-18
      48.71
      210.00
      20.00
      21.20
    
    
      2016-05-19
      47.09
      216.00
      20.37
      20.93
    
    
      2016-05-20
      48.02
      192.00
      21.94
      21.49
    
    
      2016-05-23
      49.04
      198.00
      24.05
      21.82
    
    
      2016-05-24
      50.46
      204.00
      23.74
      22.06
    
    
      2016-05-25
      50.56
      202.80
      24.07
      22.09
    
    
      2016-05-26
      49.73
      186.00
      23.73
      21.97
    
    
      2016-05-27
      49.90
      198.00
      22.26
      21.96
    
    
      2016-05-31
      49.65
      180.00
      22.39
      22.01
    
    
      2016-06-01
      50.07
      177.00
      21.41
      22.10
    
  

232 rows × 4 columns

Let's calculate returns (the 1 day percent change)



In [79]:

    
rets = solar_df.pct_change()
print(rets)









    



                FSLR      RGSE      SCTY       TAN
Date                                              
2015-07-01       NaN       NaN       NaN       NaN
2015-07-02 -0.018897  0.063830 -0.002481 -0.007467
2015-07-06 -0.021696 -0.160000 -0.009948 -0.056550
2015-07-07  0.021045 -0.023810  0.028213 -0.007424
2015-07-08 -0.041002 -0.134146 -0.032513 -0.067313
2015-07-09  0.008782  0.028169  0.009713  0.046332
2015-07-10  0.008706  0.075342  0.019623  0.035765
2015-07-13  0.044969  0.038217  0.007358  0.024116
2015-07-14 -0.004347 -0.073620  0.016295  0.013112
2015-07-15 -0.028815 -0.019868 -0.031699 -0.021130
2015-07-16  0.006069  0.006757  0.004949  0.012142
2015-07-17 -0.018097  0.006711 -0.012311 -0.000800
2015-07-20 -0.015245 -0.053333  0.069415  0.004803
2015-07-21  0.004621  0.049296  0.007710 -0.015932
2015-07-22 -0.022539  0.033557 -0.014947 -0.011873
2015-07-23  0.005647 -0.025974 -0.030527 -0.020208
2015-07-24  0.005381  0.046667  0.017701 -0.011427
2015-07-27 -0.016290  0.025478  0.001282 -0.025092
2015-07-28  0.026260  0.024845  0.024502  0.019954
2015-07-29  0.049793  0.000000  0.036052  0.027502
2015-07-30 -0.005051  0.006061  0.026357 -0.005243
2015-07-31 -0.022291  0.042169 -0.026519 -0.020250
2015-08-03 -0.013770  0.075145 -0.001379 -0.031144
2015-08-04  0.018540  0.139785  0.018646 -0.008475
2015-08-05  0.166742 -0.042453  0.019831  0.048924
2015-08-06 -0.013867  0.029557 -0.080273 -0.047204
2015-08-07  0.017187 -0.052632 -0.055837 -0.018284
2015-08-10  0.026882 -0.101010 -0.020670  0.026134
2015-08-11 -0.038145 -0.207865 -0.046512 -0.040398
2015-08-12  0.008554 -0.008227  0.042427  0.009762
...              ...       ...       ...       ...
2016-04-20  0.019828  0.038262  0.078269  0.018979
2016-04-21  0.003564 -0.074074  0.037366  0.002070
2016-04-22 -0.006458  0.040000 -0.015648  0.004957
2016-04-25 -0.018687 -0.017308 -0.064787 -0.016030
2016-04-26  0.014075  0.035225  0.058371  0.016291
2016-04-27  0.012900 -0.035917  0.009394  0.018907
2016-04-28 -0.080767  0.000196 -0.032122 -0.024607
2016-04-29 -0.020694 -0.009998 -0.059553 -0.014888
2016-05-02 -0.017729 -0.009703 -0.025396 -0.012175
2016-05-03 -0.036463 -0.096181 -0.104907 -0.033574
2016-05-04 -0.020246  0.017699 -0.105860 -0.011873
2016-05-05  0.003476  0.010870 -0.083721 -0.016911
2016-05-06 -0.003079  0.000000  0.007845  0.001811
2016-05-09 -0.023166 -0.006882  0.030678 -0.022142
2016-05-10  0.013241 -0.003898 -0.208352 -0.011091
2016-05-11 -0.010337  0.065217  0.048822  0.010280
2016-05-12 -0.032519 -0.081633  0.012306 -0.020814
2016-05-13 -0.010796 -0.295556  0.035941 -0.008975
2016-05-16  0.010502  0.104101  0.023469  0.014299
2016-05-17 -0.011412  0.028571  0.021934 -0.007989
2016-05-18  0.004123 -0.027778 -0.024390  0.004263
2016-05-19 -0.033258  0.028571  0.018500 -0.012736
2016-05-20  0.019749 -0.111111  0.077074  0.026756
2016-05-23  0.021241  0.031250  0.096171  0.015356
2016-05-24  0.028956  0.030303 -0.012890  0.010999
2016-05-25  0.001982 -0.005882  0.013901  0.001360
2016-05-26 -0.016416 -0.082840 -0.014125 -0.005432
2016-05-27  0.003418  0.064516 -0.061947 -0.000455
2016-05-31 -0.005010 -0.090909  0.005840  0.002277
2016-06-01  0.008459 -0.016667 -0.043770  0.004089

[232 rows x 4 columns]

Let's visualize the relationship between each stock and the ETF



In [80]:

    
import matplotlib.pyplot as plt
plt.scatter(rets.FSLR,rets.TAN)









    Out[80]:





<matplotlib.collections.PathCollection at 0x113864240>



In [81]:

    
plt.scatter(rets.RGSE,rets.TAN)









    Out[81]:





<matplotlib.collections.PathCollection at 0x112f29eb8>



In [82]:

    
plt.scatter(rets.SCTY,rets.TAN)









    Out[82]:





<matplotlib.collections.PathCollection at 0x11317e518>

The correlation matrix



In [83]:

    
solar_corr = rets.corr()
print(solar_corr)









    



          FSLR      RGSE      SCTY       TAN
FSLR  1.000000  0.249923  0.272612  0.670114
RGSE  0.249923  1.000000  0.236604  0.389566
SCTY  0.272612  0.236604  1.000000  0.559854
TAN   0.670114  0.389566  0.559854  1.000000

Basic risk analysis

We'll plot the mean and std or returns for each ticker to get a sense of the risk return profile



In [84]:

    
plt.scatter(rets.mean(), rets.std())
plt.xlabel('Expected returns')
plt.ylabel('Standard deviations')
for label, x, y in zip(rets.columns, rets.mean(), rets.std()):
    plt.annotate(
        label, 
        xy = (x, y), xytext = (20, -20),
        textcoords = 'offset points', ha = 'right', va = 'bottom',
        bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
        arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))
plt.show()

Regressions

http://statsmodels.sourceforge.net/

Steps for regression

Construct y (dependent variable series)

Construct matrix (dataframe) of X (independent variable series)

Add intercept

Model the regression

Get the results

The statsmodels library contains various regression packages. We'll use the OLS (Ordinary Least Squares) model



In [85]:

    
import numpy as np
import statsmodels.api as sm
X=solar_df[['FSLR','RGSE','SCTY']]
X = sm.add_constant(X)
y=solar_df['TAN']
model = sm.OLS(y,X,missing='drop')
result = model.fit()
print(result.summary())









    



                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    TAN   R-squared:                       0.851
Model:                            OLS   Adj. R-squared:                  0.849
Method:                 Least Squares   F-statistic:                     435.1
Date:                Thu, 31 Aug 2017   Prob (F-statistic):           4.89e-94
Time:                        17:59:54   Log-Likelihood:                -464.33
No. Observations:                 232   AIC:                             936.7
Df Residuals:                     228   BIC:                             950.4
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const         15.2915      1.180     12.956      0.000        12.966    17.617
FSLR           0.0087      0.017      0.521      0.603        -0.024     0.041
RGSE           0.0073      0.001      9.684      0.000         0.006     0.009
SCTY           0.2156      0.015     14.177      0.000         0.186     0.246
==============================================================================
Omnibus:                        4.153   Durbin-Watson:                   0.109
Prob(Omnibus):                  0.125   Jarque-Bera (JB):                3.754
Skew:                           0.239   Prob(JB):                        0.153
Kurtosis:                       2.600   Cond. No.                     5.88e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.88e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Finally plot the fitted line with the actual y values



In [86]:

    
fig, ax = plt.subplots(figsize=(8,6))
ax.plot(y)
ax.plot(result.fittedvalues)









    Out[86]:





[<matplotlib.lines.Line2D at 0x115f92978>]



In [ ]:

	Open	High	Low	Close	Volume
Date
2017-01-03	167.00	167.87	166.01	167.19	2934299
2017-01-04	167.77	169.87	167.36	169.26	3381432
2017-01-05	169.25	169.39	167.26	168.70	2682301
2017-01-06	168.69	169.92	167.52	169.53	2945536
2017-01-09	169.47	169.80	167.62	167.65	3189891
2017-01-10	167.98	168.09	165.34	165.52	4118694
2017-01-11	166.05	167.76	165.60	167.75	3599464
2017-01-12	167.77	168.01	165.56	167.95	2927973
2017-01-13	167.97	168.48	166.88	167.34	2875433
2017-01-17	166.69	168.18	166.12	167.89	3315655
2017-01-18	167.45	168.59	166.69	166.80	4007779
2017-01-19	166.96	167.45	165.80	166.81	6963386
2017-01-20	167.81	170.64	166.00	170.55	12690029
2017-01-23	170.08	171.25	170.01	171.03	5478224
2017-01-24	171.36	176.00	171.16	175.90	6952369
2017-01-25	176.26	179.25	176.13	178.29	5922462
2017-01-26	178.06	178.88	177.51	178.66	3063924
2017-01-27	178.47	179.20	177.30	177.30	3482311
2017-01-30	176.98	177.07	174.58	175.80	4094755
2017-01-31	175.05	175.58	173.61	174.52	4138962
2017-02-01	175.00	175.70	172.89	174.29	2795745
2017-02-02	174.23	174.97	173.55	174.58	2620526
2017-02-03	175.00	176.34	174.89	175.82	3309288
2017-02-06	175.31	175.98	174.38	175.86	2650785
2017-02-07	176.00	178.62	175.90	178.46	3873455
2017-02-08	177.50	177.50	175.88	176.17	2913126
2017-02-09	176.17	177.80	175.91	177.21	3090710
2017-02-10	177.37	178.87	176.76	178.68	2930710
2017-02-13	179.24	179.90	178.84	179.36	3093372
2017-02-14	178.57	180.13	178.35	180.13	2776528
...	...	...	...	...	...
2017-07-19	150.02	150.25	146.71	147.53	14293610
2017-07-20	147.53	148.83	147.03	147.66	7212233
2017-07-21	147.59	147.87	146.51	147.08	6509459
2017-07-24	147.00	147.04	145.80	145.99	3877593
2017-07-25	146.57	147.49	146.03	146.19	4208895
2017-07-26	146.27	146.46	144.47	145.36	3932361
2017-07-27	145.00	145.40	143.64	145.07	6430841
2017-07-28	144.81	145.04	143.84	144.29	3054834
2017-07-31	NaN	144.93	NaN	144.67	4355718
2017-08-01	145.00	145.67	144.72	145.30	3219477
2017-08-02	145.12	145.29	144.22	144.45	3540949
2017-08-03	144.43	145.34	144.43	144.94	3035393
2017-08-04	145.00	145.39	144.40	145.16	2549400
2017-08-07	145.00	145.09	142.75	143.47	4618284
2017-08-08	142.00	142.76	142.00	142.11	3298718
2017-08-09	141.75	142.03	141.19	141.77	2925770
2017-08-10	141.64	142.42	141.23	141.84	4525378
2017-08-11	142.45	142.59	141.52	141.84	3004360
2017-08-14	142.00	142.76	141.94	142.32	2248607
2017-08-15	142.55	143.34	140.58	142.07	4681545
2017-08-16	141.79	142.91	141.61	142.50	3287821
2017-08-17	142.52	142.89	140.70	140.70	4422087
2017-08-18	140.87	141.15	139.58	139.70	4152714
2017-08-21	139.59	140.42	139.13	140.33	3190691
2017-08-22	140.64	141.15	140.38	141.01	2544446
2017-08-23	140.70	143.05	140.68	142.14	4048317
2017-08-24	142.50	143.68	142.25	142.94	3592183
2017-08-25	143.30	144.19	143.25	143.74	2872893
2017-08-28	144.11	144.56	142.39	142.51	5134463
2017-08-29	142.10	143.42	141.90	143.14	3009520

	Open	High	Low	Close	Volume	UP
count	165.000000	166.000000	165.000000	166.000000	1.660000e+02	166.000000
mean	161.618667	162.346084	160.736848	161.486566	4.112198e+06	0.487952
std	12.905701	13.028233	12.835382	12.995748	2.090827e+06	0.501367
min	139.590000	140.420000	139.130000	139.700000	1.825048e+06	0.000000
25%	152.030000	152.875000	151.600000	152.102500	3.057106e+06	0.000000
50%	159.440000	159.470000	158.640000	158.945000	3.547950e+06	0.000000
75%	174.300000	175.390000	173.460000	174.507500	4.406930e+06	1.000000
max	182.000000	182.790000	180.920000	181.950000	1.928428e+07	1.000000

	FSLR	RGSE	SCTY	TAN
Date
2015-07-01	46.04	1128.00	52.40	38.84
2015-07-02	45.17	1200.00	52.27	38.55
2015-07-06	44.19	1008.00	51.75	36.37
2015-07-07	45.12	984.00	53.21	36.10
2015-07-08	43.27	852.00	51.48	33.67
2015-07-09	43.65	876.00	51.98	35.23
2015-07-10	44.03	942.00	53.00	36.49
2015-07-13	46.01	978.00	53.39	37.37
2015-07-14	45.81	906.00	54.26	37.86
2015-07-15	44.49	888.00	52.54	37.06
2015-07-16	44.76	894.00	52.80	37.51
2015-07-17	43.95	900.00	52.15	37.48
2015-07-20	43.28	852.00	55.77	37.66
2015-07-21	43.48	894.00	56.20	37.06
2015-07-22	42.50	924.00	55.36	36.62
2015-07-23	42.74	900.00	53.67	35.88
2015-07-24	42.97	942.00	54.62	35.47
2015-07-27	42.27	966.00	54.69	34.58
2015-07-28	43.38	990.00	56.03	35.27
2015-07-29	45.54	990.00	58.05	36.24
2015-07-30	45.31	996.00	59.58	36.05
2015-07-31	44.30	1038.00	58.00	35.32
2015-08-03	43.69	1116.00	57.92	34.22
2015-08-04	44.50	1272.00	59.00	33.93
2015-08-05	51.92	1218.00	60.17	35.59
2015-08-06	51.20	1254.00	55.34	33.91
2015-08-07	52.08	1188.00	52.25	33.29
2015-08-10	53.48	1068.00	51.17	34.16
2015-08-11	51.44	846.00	48.79	32.78
2015-08-12	51.88	839.04	50.86	33.10
...	...	...	...	...
2016-04-20	61.72	324.00	32.65	24.16
2016-04-21	61.94	300.00	33.87	24.21
2016-04-22	61.54	312.00	33.34	24.33
2016-04-25	60.39	306.60	31.18	23.94
2016-04-26	61.24	317.40	33.00	24.33
2016-04-27	62.03	306.00	33.31	24.79
2016-04-28	57.02	306.06	32.24	24.18
2016-04-29	55.84	303.00	30.32	23.82
2016-05-02	54.85	300.06	29.55	23.53
2016-05-03	52.85	271.20	26.45	22.74
2016-05-04	51.78	276.00	23.65	22.47
2016-05-05	51.96	279.00	21.67	22.09
2016-05-06	51.80	279.00	21.84	22.13
2016-05-09	50.60	277.08	22.51	21.64
2016-05-10	51.27	276.00	17.82	21.40
2016-05-11	50.74	294.00	18.69	21.62
2016-05-12	49.09	270.00	18.92	21.17
2016-05-13	48.56	190.20	19.60	20.98
2016-05-16	49.07	210.00	20.06	21.28
2016-05-17	48.51	216.00	20.50	21.11
2016-05-18	48.71	210.00	20.00	21.20
2016-05-19	47.09	216.00	20.37	20.93
2016-05-20	48.02	192.00	21.94	21.49
2016-05-23	49.04	198.00	24.05	21.82
2016-05-24	50.46	204.00	23.74	22.06
2016-05-25	50.56	202.80	24.07	22.09
2016-05-26	49.73	186.00	23.73	21.97
2016-05-27	49.90	198.00	22.26	21.96
2016-05-31	49.65	180.00	22.39	22.01
2016-06-01	50.07	177.00	21.41	22.10