Welcome to Quantopian. In this tutorial, we introduce Quantopian, the problems it aims to solve, and the tools it provides to help you solve those problems. At the end of this lesson, you should have a high level understanding of what you can do with Quantopian.
The focus of the tutorial is to get you started, not to make you an expert Quantopian user. If you already feel comfortable with the basics of Quantopian, there are other resources to help you learn more about Quantopian's tools:
All you need to get started on this tutorial is some basic Python programming skills.
Note: You are currently viewing this tutorial lesson in the Quantopian Research environment. Research is a hosted Jupyter notebook environment that allows you to interactively run Python code. Research comes with a mix of proprietary and open-source Python libraries pre-installed. To learn more about Research, see the documentation. You can follow along with the code in this notebook by cloning it. Each cell of code (grey boxes) can be run by pressing Shift + Enter. This tutorial notebook is read-only. If you want to make changes to the notebook, create a new notebook and copy the code from this tutorial.
Quantopian is a cloud-based software platform that allows you to research cross-sectional factors in developed and emerging equity markets around the world using Python. Quantopian makes it easy to iterate on ideas by supplying a fast, uniform API on top of all sorts of financial data. Additionally, Quantopian provides tools to help you upload your own financial datasets, analyze the efficacy of your factors, and download your work into a local environment so that you can integrate it with other systems.
Typically, researching cross-sectional equity factors involves the following steps:
On Quantopian, steps 1 and 2 are achieved using the Pipeline API, step 3 is done using a tool called Alphalens, and step 4 is done using a tool called Aqueduct. The rest of this tutorial will give a brief walkthrough of an end-to-end factor research workflow on Quantopian.
The code in this tutorial can be run in Quantopian's Research environment (this notebook is currently running in Research). Research is a hosted Jupyter notebook environment that allows you to interactively run Python code. Research comes with a mix of proprietary and open-source Python libraries pre-installed. To learn more about Research, see the documentation.
Press Shift+Enter to run each cell of code (grey boxes).
The first step to researching a cross-sectional equity factor is to select a “universe” of equities over which our factor will be defined. In this context, a universe represents the set of equities we want to consider when performing computations later. On Quantopian, defining a universe is done using the the Pipeline API. Later on, we will use the same API to compute factors over the equities in this universe.
The Pipeline API provides a uniform interface to several built-in datasets, as well as any custom datasets that we upload to our account. Pipeline makes it easy to define computations or expressions using built-in and custom data. For example, the following code snippet imports two built-in datasets, FactSet Fundamentals and FactSet Equity Metadata, and uses them to define an equity universe.
In [1]:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.factset import Fundamentals, EquityMetadata
is_share = EquityMetadata.security_type.latest.eq('SHARE')
is_primary = EquityMetadata.is_primary.latest
primary_shares = (is_share & is_primary)
market_cap = Fundamentals.mkt_val.latest
universe = market_cap.top(1000, mask=primary_shares)
The above example defines a universe to be the top 1000 primary issue common stocks ranked by market cap. Universes can be defined using any of the data available on Quantopian. Additionally, you can upload your own data, such as index constituents or another custom universe to the platform using the Self-Serve Data tool. To learn more about uploading a custom dataset, see the Self-Serve Data documentation. For now, we will stick with the universe definition above.
After defining a universe, the next step is to define a factor for testing. On Quantopian, a factor is a computation that produces numerical values at a regular frequency for all assets in a universe. Similar to step 1, we will use the the Pipeline API to define factors. In addition to providing a fast, uniform API on top of pre-integrated and custom datasets, Pipeline also provides a set of built-in classes and methods that can be used to quickly define factors. For example, the following code snippet defines a momentum factor using fast and slow moving average computations.
In [2]:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import EquityPricing
from quantopian.pipeline.factors import SimpleMovingAverage
# 1-month (21 trading day) moving average factor.
fast_ma = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=21)
# 6-month (126 trading day) moving average factor.
slow_ma = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=126)
# Divide fast_ma by slow_ma to get momentum factor and z-score.
momentum = fast_ma / slow_ma
momentum_factor = momentum.zscore()
Now that we defined a universe and a factor, we can choose a market and time period and simulate the factor. One of the defining features of the Pipeline API is that it allows us to define universes and factors using high level terms, without having to worry about common data engineering problems like adjustments, point-in-time data, symbol mapping, delistings, and data alignment. Pipeline does all of that work behind the scenes and allows us to focus our time on building and testing factors.
The below code creates a Pipeline instance that adds our factor as a column and screens down to equities in our universe. The Pipline is then run over the US equities market from 2016 to 2019.
In [3]:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import EquityPricing
from quantopian.pipeline.data.factset import Fundamentals, EquityMetadata
from quantopian.pipeline.domain import US_EQUITIES, ES_EQUITIES
from quantopian.pipeline.factors import SimpleMovingAverage
is_share = EquityMetadata.security_type.latest.eq('SHARE')
is_primary = EquityMetadata.is_primary.latest
primary_shares = (is_share & is_primary)
market_cap = Fundamentals.mkt_val.latest
universe = market_cap.top(1000, mask=primary_shares)
# 1-month moving average factor.
fast_ma = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=21)
# 6-month moving average factor.
slow_ma = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=126)
# Divide fast_ma by slow_ma to get momentum factor and z-score.
momentum = fast_ma / slow_ma
momentum_factor = momentum.zscore()
# Create a US equities pipeline with our momentum factor, screening down to our universe.
pipe = Pipeline(
columns={
'momentum_factor': momentum_factor,
},
screen=momentum_factor.percentile_between(50, 100, mask=universe),
domain=US_EQUITIES,
)
# Run the pipeline from 2016 to 2019 and display the first few rows of output.
from quantopian.research import run_pipeline
factor_data = run_pipeline(pipe, '2016-01-01', '2019-01-01')
print("Result contains {} rows of output.".format(len(factor_data)))
factor_data.head()
Out[3]:
The next step is to test the predictiveness of the factor we defined in step 2. In order to determine if our factor is predictive, we calculate the forward returns for the factor's assets over the factor's dates. We then pass the factor and the forward returns into Alphalens. The following code cell shows how to get this returns data and send it to Alphalens.
The next step is to test the predictiveness of the factor we defined in step 2. In order to determine if our factor is predictive, load returns data from Pipeline, and then feed the factor and returns data into Alphalens. The following code cell loads the 1-day trailing returns for equities in our universe, shifts them back, and formats the data for use in Alphalens.
In [4]:
from quantopian.research import get_forward_returns
import alphalens as al
# Get the 1-day forward returns for the assets and dates in the factor
returns_df = get_forward_returns(
factor_data['momentum_factor'],
[1],
US_EQUITIES
)
# Format the factor and returns data so that we can run it through Alphalens.
al_data = al.utils.get_clean_factor(
factor_data['momentum_factor'],
returns_df,
quantiles=5,
bins=None,
)
Then, we can create a factor tearsheet to analyze our momentum factor.
In [5]:
from alphalens.tears import create_full_tear_sheet
create_full_tear_sheet(al_data)
The Alphalens tearsheet offers insight into the predictive ability of a factor.
To learn more about Alphalens, check out the documentation.
When we have a factor that we like, the next step is often to download the factor data so we can integrate it with another system. On Quantopian, downloading pipeline results to a local environment is done using Aqueduct. Aqueduct is an HTTP API that enables remote execution of pipelines, and makes it possible to download results to a local environment.
Quantopian accounts do not have access to Aqueduct by default. It is an additional feature to which you will need to request access. If you would like to learn more about adding Aqueduct to your Quantopian account, please contact us at feedback@quantopian.com.
In this tutorial, we introduced Quantopian and walked through an example factor research workflow using Pipeline, Alphalens, and Aqueduct. Quantopian has a rich set of documentation and tutorials on these tools and others. We recommend starting with the tutorials or the User Guide section of the documentation if you would like to grow your understanding of Quantopian.
If you would like to learn more about Quantopian's enterprise offering, please contact us at enterprise@quantopian.com.