By Christopher van Hoecke, Maxwell Margenot
https://www.quantopian.com/lectures/introduction-to-pandas
This lecture corresponds to the Introduction to Pandas lecture, which is part of the Quantopian lecture series. This homework expects you to rely heavily on the code presented in the corresponding lecture. Please copy and paste regularly from that lecture when starting to work on the problems, as trying to do them from scratch will likely be too difficult.
Part of the Quantopian Lecture Series:
In [1]:
# Useful Functions
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
In [2]:
l = np.random.randint(1,100, size=1000)
s = pd.Series(l)
new_index = pd.date_range("2016-01-01", periods=len(s), freq="D")
s.index = new_index
print s
In [3]:
# Print every other element of the first 50 elements
s.iloc[:50:2];
# Values associated with the index 2017-02-20
s.loc['2017-02-20']
Out[3]:
In [4]:
# Print s between 1 and 3
s.loc[(s>1) & (s<3)]
Out[4]:
In [5]:
# First 5 elements
s.head(5)
# Last 5 elements
s.tail(5)
Out[5]:
In [6]:
symbol = "CMG"
start = "2012-01-01"
end = "2016-01-01"
prices = get_pricing(symbol, start_date=start, end_date=end, fields="price")
# Resample daily prices to get monthly prices using median.
monthly_prices = prices.resample('M').median()
monthly_prices.head(24)
Out[6]:
In [7]:
# Data for every day, (including weekends and holidays)
calendar_dates = pd.date_range(start=start, end=end, freq='D', tz='UTC')
calendar_prices = prices.reindex(calendar_dates, method='ffill')
calendar_prices.head(15)
Out[7]:
In [8]:
# Fill missing data using Backwards fill method
bfilled_prices = calendar_prices.fillna(method='bfill')
bfilled_prices.head(10)
Out[8]:
In [9]:
# Drop instances of nan in the data
dropped_prices = calendar_prices.dropna()
dropped_prices.head(10)
Out[9]:
In [10]:
print "Summary Statistics"
print prices.describe()
In [11]:
data = get_pricing('GE', fields='open_price', start_date='2016-01-01', end_date='2017-01-01')
mult_returns = data.pct_change()[1:] #Multiplicative returns
add_returns = data.diff()[1:] #Additive returns
In [12]:
# Rolling mean
rolling_mean = data.rolling(window=60).mean()
rolling_mean.name = "60-day rolling mean"
In [13]:
# Rolling Standard Deviation
rolling_std = data.rolling(window=60).std()
rolling_std.name = "60-day rolling volatility"
In [14]:
l = ['First','Second', 'Third', 'Fourth', 'Fifth']
dict_data = {'a' : [1, 2, 3, 4, 5],
'b' : ['L', 'K', 'J', 'M', 'Z'],
'c' : np.random.normal(0, 1, 5)
}
# Adding l as an index to dict_data
frame_data = pd.DataFrame(dict_data, index=l)
print frame_data
In [15]:
s1 = pd.Series([2, 3, 5, 7, 11, 13], name='prime')
s2 = pd.Series([1, 4, 6, 8, 9, 10], name='other')
numbers = pd.concat([s1, s2], axis=1) # Concatenate the two series
numbers.columns = ['Useful Numbers', 'Not Useful Numbers'] # Rename the two columns
numbers.index = pd.date_range("2016-01-01", periods=len(numbers)) # Index change
print numbers
In [16]:
symbol = ["XOM", "BP", "COP", "TOT"]
start = "2012-01-01"
end = "2016-01-01"
prices = get_pricing(symbol, start_date=start, end_date=end, fields="price")
if isinstance(symbol, list):
prices.columns = map(lambda x: x.symbol, prices.columns)
else:
prices.name = symbol
# Check Type of Data for these two.
prices.XOM.head()
prices.loc[:, 'XOM'].head()
Out[16]:
In [17]:
# Print data type
print type(prices.XOM)
print type(prices.loc[:, 'XOM'])
In [18]:
# Print values associated with time range
prices.loc['2013-01-01':'2013-01-10']
Out[18]:
prices
) to only print values where:nan
values ((BP > 30 and XOM < 100) or TOT is non-NaN
).
In [19]:
# Filter data
# BP > 30
print prices.loc[prices.BP > 30].head()
# XOM < 100
print prices.loc[prices.XOM < 100].head()
# BP > 30 AND XOM < 100
print prices.loc[(prices.BP > 30) & (prices.XOM < 100)].head()
# The union of (BP > 30 AND XOM < 100) with TOT being non-nan
print prices.loc[((prices.BP > 30) & (prices.XOM < 100)) | (~ prices.TOT.isnull())].head()
In [20]:
# Adding TSLA
s_1 = get_pricing('TSLA', start_date=start, end_date=end, fields='price')
prices.loc[:, 'TSLA'] = s_1
# Dropping XOM
prices = prices.drop('XOM', axis=1)
prices.head(5)
Out[20]:
In [21]:
df_1 = get_pricing(['SPY', 'VXX'], start_date=start, end_date=end, fields='price')
df_2 = get_pricing(['MSFT', 'AAPL', 'GOOG'], start_date=start, end_date=end, fields='price')
# Concatenate the dataframes
df_3 = pd.concat([df_1, df_2], axis=1)
df_3.head()
Out[21]:
In [22]:
# Fill GOOG missing data with nan
filled0_df_3 = df_3.fillna(0)
filled0_df_3.head(5)
Out[22]:
prices
DataFrame from above.
In [23]:
# Summary
prices.describe()
Out[23]:
In [24]:
# Natural Log of the returns and print out the first 10 values
np.log(prices).head(10)
Out[24]:
In [25]:
# Multiplicative returns
mult_returns = prices.pct_change()[1:]
mult_returns.head()
Out[25]:
In [26]:
# Normalizing the returns and plotting one year of data
norm_returns = (mult_returns - mult_returns.mean(axis=0))/mult_returns.std(axis=0)
norm_returns.loc['2014-01-01':'2015-01-01'].plot();
In [27]:
# Rolling mean
rolling_mean = prices.rolling(window=60).mean()
rolling_mean.columns = prices.columns
# Rolling standard deviation
rolling_std = prices.rolling(window=60).std()
rolling_mean.columns = prices.columns
# Plotting
mean = rolling_mean.plot();
plt.title("Rolling Mean of Prices")
plt.xlabel("Date")
plt.ylabel("Price")
plt.legend();
std = rolling_std.plot();
plt.title("Rolling standard deviation of Prices")
plt.xlabel("Date")
plt.ylabel("Price")
plt.legend();
Congratulations on completing the Introduction to pandas exercises!
As you learn more about writing trading algorithms and the Quantopian platform, be sure to check out the daily Quantopian Contest, in which you can compete for a cash prize every day.
Start by going through the Writing a Contest Algorithm Tutorial.
This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, Quantopian, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian, Inc. at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.