In [1]:
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.factors import SimpleMovingAverage, AverageDollarVolume
A classifier is a function from an asset and a moment in time to a categorical output such as a string
or integer
label:
F(asset, timestamp) -> category
An example of a classifier producing a string output is the exchange ID of a security. To create this classifier, we'll have to import morningstar.share_class_reference.exchange_id
and use the latest attribute to instantiate our classifier:
In [2]:
from quantopian.pipeline.data import morningstar
# Since the underlying data of morningstar.share_class_reference.exchange_id
# is of type string, .latest returns a Classifier
exchange = morningstar.share_class_reference.exchange_id.latest
Previously, we saw that the latest
attribute produced an instance of a Factor
. In this case, since the underlying data is of type string
, latest
produces a Classifier
.
Similarly, a computation producing the latest Morningstar sector code of a security is a Classifier
. In this case, the underlying type is an int
, but the integer doesn't represent a numerical value (it's a category) so it produces a classifier. To get the latest sector code, we can use the built-in Sector
classifier.
In [3]:
from quantopian.pipeline.classifiers.morningstar import Sector
morningstar_sector = Sector()
Using Sector
is equivalent to morningstar.asset_classification.morningstar_sector_code.latest
.
Classifiers can also be used to produce filters with methods like isnull
, eq
, and startswith
. The full list of Classifier
methods producing Filters
can be found here. can be found here.
As an example, if we wanted a filter to select for securities trading on the New York Stock Exchange, we can use the eq
method of our exchange
classifier.
In [4]:
nyse_filter = exchange.eq('NYS')
This filter will return True
for securities having 'NYS'
as their most recent exchange_id
.
Classifiers can also be produced from various Factor
methods. The most general of these is the quantiles
method which accepts a bin count as an argument. The quantiles
method assigns a label from 0 to (bins - 1) to every non-NaN data point in the factor output and returns a Classifier
with these labels. NaN
s are labeled with -1. Aliases are available for quartiles (quantiles(4)
), quintiles (quantiles(5)
), and deciles (quantiles(10)
). As an example, this is what a filter for the top decile of a factor might look like:
In [5]:
dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()
top_decile = (dollar_volume_decile.eq(9))
Let's put each of our classifiers into a pipeline and run it to see what they look like.
In [6]:
def make_pipeline():
exchange = morningstar.share_class_reference.exchange_id.latest
nyse_filter = exchange.eq('NYS')
morningstar_sector = Sector()
dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()
top_decile = (dollar_volume_decile.eq(9))
return Pipeline(
columns={
'exchange': exchange,
'sector_code': morningstar_sector,
'dollar_volume_decile': dollar_volume_decile
},
screen=(nyse_filter & top_decile)
)
In [7]:
result = run_pipeline(make_pipeline(), '2015-05-05', '2015-05-05')
print 'Number of securities that passed the filter: %d' % len(result)
result.head(5)
Out[7]:
Classifiers are also useful for describing grouping keys for complex transformations on Factor outputs. Grouping operations such as demean and groupby are outside the scope of this tutorial. A future tutorial will cover more advanced uses for classifiers.
In the next lesson, we'll look at the different datasets that we can use in pipeline.