Mentorship Program

Description of the Quantopian API

  • Quantopian provides a platform for you to build, test, and execute trading algorithms.

In [1]:
import datetime
import numpy as np
import pandas as pd
import zipline

%matplotlib inline

STOCKS = ['AMD', 'CERN', 'COST', 'DELL', 'GPS', 'INTC', 'MMM']

Example 1: Buy Apple Stock


In [2]:
class BUY_APPLE(zipline.TradingAlgorithm):
    """
    Copy the sample trading algorithm from Quantopian and see if we can
    run it in zipline (what needs to change to convert between their platform
    th)
    """
    def initialize(self):
        pass
    
    def handle_data(self, data):
        self.order(zipline.api.symbol('AAPL'), 10)
        self.record(APPL=data[zipline.api.symbol('AAPL')].price)

In [3]:
start = datetime.datetime(2001, 8, 1)
end = datetime.datetime(2013, 2, 1)
data = zipline.utils.factory.load_from_yahoo(stocks=['AAPL', 'AMD'], indexes={}, start=start, end=end)


AAPL
AMD

In [4]:
def run_buy_apple():
    buy_apple = BUY_APPLE();
    results = buy_apple.run(data)
    return results.portfolio_value

In [5]:
results_buy_apple = run_buy_apple()


[2015-07-09 21:13:06.809000] INFO: Performance: Simulated 2893 trading days out of 2893.
[2015-07-09 21:13:06.810000] INFO: Performance: first open: 2001-08-01 13:31:00+00:00
[2015-07-09 21:13:06.811000] INFO: Performance: last close: 2013-02-01 21:00:00+00:00

In [6]:
results_buy_apple.tail()


Out[6]:
2013-01-28 21:00:00    1246567.217167
2013-01-29 21:00:00    1279565.834323
2013-01-30 21:00:00    1273933.509309
2013-01-31 21:00:00    1268690.367442
2013-02-01 21:00:00    1261371.061151
Name: portfolio_value, dtype: float64

In [7]:
results_buy_apple.head()


Out[7]:
2001-08-01 20:00:00    100000.000000
2001-08-02 20:00:00     99999.699867
2001-08-03 20:00:00     99999.185096
2001-08-06 20:00:00     99998.388448
2001-08-07 20:00:00     99998.329819
Name: portfolio_value, dtype: float64

In [8]:
results_buy_apple.plot()


Out[8]:
<matplotlib.axes.AxesSubplot at 0x119afdd8>

In [9]:
from collections import deque as moving_window


class DualMovingAverage(zipline.TradingAlgorithm):
    """ Implements the Dual Moving average """
    
    def initialize(self, short_window=100, long_window=300):
        self.short_window = moving_window(maxlen=short_window)
        self.long_window = moving_window(maxlen=long_window)
        
        
    def handle_data(self, data):
        self.short_window.append(data[zipline.api.symbol('AAPL')].price)
        self.long_window.append(data[zipline.api.symbol('AAPL')].price)
        
        short_mavg = np.mean(self.short_window)
        long_mavg = np.mean(self.long_window)
        
        #Trading logic
        if short_mavg > long_mavg:
            self.order_target(zipline.api.symbol('AAPL'), 100)
        elif short_mavg < long_mavg:
            self.order_target(zipline.api.symbol('AAPL'), 0)
        
        self.record(APPL=data[zipline.api.symbol('AAPL')].price,
                   short_mavg=short_mavg,
                   long_mavg=long_mavg)

In [10]:
def run_dual_moving_ave():
    moving_ave = DualMovingAverage();
    results = moving_ave.run(data)
    return results.portfolio_value

In [11]:
results_DMA = run_dual_moving_ave()


[2015-07-09 21:17:29.186000] INFO: Performance: Simulated 2893 trading days out of 2893.
[2015-07-09 21:17:29.187000] INFO: Performance: first open: 2001-08-01 13:31:00+00:00
[2015-07-09 21:17:29.187000] INFO: Performance: last close: 2013-02-01 21:00:00+00:00

In [14]:
results_DMA.plot()


Out[14]:
<matplotlib.axes.AxesSubplot at 0xfd19b38>

Machine Learning

Scikit Learn's home page divides up the space of machine learning well, but the Mahout algorithms list has a more comprehensive list of algorithms. From both:

  • Collaborative filtering
    'because you bought these, we recommend this'
  • Classification
    'people with these characteristics, if sent a mailer, will buy something 30% of the time'
  • Clustering
    'our customers naturally fall into these groups: urban singles, guys with dogs, women 25-35 who like rap'
  • Dimension reduction
    'a preprocessing step before regression that can also identify the most significant contributors to variation'
  • Topics
    'the posts in this user group are related to either local politics, music, or sports'

The S&P 500 dataset is great for us to quickly explore regression, clustering, and principal component analysis.

Example: K-means Clustering

Goal is to cluster Chicago-area Fortune 500 stocks by similar day-to-day returns in 2012. Steps:

  1. Get and transform the data (one row per company, one column per day in the year)
  2. Iteratively try different 'K' values for k-means and pick one
  3. See what the clusters say about which stocks are similar (expect similarity within industrial group)

In [13]:
# This is a module we wrote using pg8000 to access our Postgres database on Heroku
from database import Database
db = Database()

In [14]:
#list of Chicago's fortune 500 companies' ticker symbols
chicago_companies_lookup = dict(
    ABT = "Abbot",
    ADM = "Archer-Daniels Midland",
    ALL = "Allstate",
    BA = "Boeing",
    CF = "CF Industries (Fertilizer)",
    DFS = "Discover",
    DOV = "Dover Corporation (industrial products)",
    EXC = "Exelon",
    GWW = "Grainger",
    ITW = "Illinois Tool Works",
    MCD = "McDonalds",
    MDLZ = "Mondelez",
    MSI = "Motorola",
    NI = "Nicor",
    TEG = "Integrys (energy)")

chicago_companies = chicago_companies_lookup.keys()

returns = db.select( ('SELECT dt, "{}" FROM return '
                      'WHERE dt BETWEEN \'2012-01-01\' AND \'2012-12-31\''
                      'ORDER BY dt;').format(
                             '", "'.join((c.lower() for c in chicago_companies))),
                            columns=["Date"] + chicago_companies)

sp_dates = [row.pop("Date") for row in returns]
returns = pd.DataFrame(returns, index=sp_dates)

In [15]:
#cluster to determine if sectors move similarly in the marketplace
from scipy.cluster.vq import  whiten
from sklearn.cluster import KMeans

import matplotlib.pyplot as plt
%matplotlib inline

In [16]:
normalize =  whiten(returns.transpose().dropna())
steps = range(2,10)
inertias = [KMeans(i).fit(normalize).inertia_ for i in steps]

plt.plot(steps, inertias, 'go-')
plt.title("Pick 5 clusters (but the dropoff looks linear)")


Out[16]:
<matplotlib.text.Text at 0x7f447e85ecd0>

In [17]:
nclust = 5
km = KMeans(n_clusters = nclust)
km.fit(normalize)

clustered_companies = [set() for i in range(nclust)]
for i in range(len(normalize.index)):
    company = normalize.index[i]
    cluster_id = km.labels_[i]
    clustered_companies[cluster_id].add(company)

print "Here are the clusters...."
for c in clustered_companies:
    print len(c), "  companies:\n    ", ", ".join(chicago_companies_lookup[co] for co in c)


1 companies: Grainger
8 companies: Nicor	Allstate	Excelon	McDonalds	Integrys (energy)	Mondelez	Abbot	Archer-Daniels Midland
1 companies: Discover
4 companies: Illinois Tool Works	Dover Corporation (industrial products)	Boeing	Motorola
1 companies: CF Industries (Fertilizer)

In [18]:
import scipy.spatial.distance as dist
import scipy.cluster.hierarchy as hclust

chicago_dist = dist.pdist(normalize, 'euclidean')
links = hclust.linkage(chicago_dist)

plt.figure(figsize=(3,4))
den = hclust.dendrogram(
    links,
    labels=[chicago_companies_lookup[co] for co in normalize.index],
    orientation="left")


plt.ylabel('Samples', fontsize=9)
plt.xlabel('Distance')
plt.suptitle('Stocks clustered by similarity', fontweight='bold', fontsize=14);


Flask Blog Built On-top of Heroku

MongoDB

We copied the Flask tutorial instructions but replaced the database with a MongoDB database, using the Flask PyMongo extension

# It's really this easy!
from flask.ext.pymongo import PyMongo

app.config['MONGO_URI'] = os.environ['MONGO_URI']
app.config['PASSWORD'] = urlparse.urlparse(app.config['MONGO_URI']).password
app.config['USERNAME'] = urlparse.urlparse(app.config['MONGO_URI']).username

mongo = PyMongo(app)

@app.route("/")
def show_entries():
    """Show all of the blog entries."""
    entries = mongo.db.entries.find(sort=[('$natural', -1)])
    return render_template('show_entries.html', entries=entries)

This is about 20 fewer lines of code (and 4 fewer steps) than using sqlite

<img src="data/flask_tutorial.png", style="width:500px"></img>

Plot.ly Graphs

-Get code from my github page


In [ ]: