In [1]:
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.factors import SimpleMovingAverage, AverageDollarVolume
A classifier is a function from an asset and a moment in time to a categorical output such as a string or integer label:
F(asset, timestamp) -> category
An example of a classifier producing a string output is the exchange ID of a security. To create this classifier, we'll have to import morningstar.share_class_reference.exchange_id and use the latest attribute to instantiate our classifier:
In [2]:
from quantopian.pipeline.data import morningstar
# Since the underlying data of morningstar.share_class_reference.exchange_id
# is of type string, .latest returns a Classifier
exchange = morningstar.share_class_reference.exchange_id.latest
Previously, we saw that the latest attribute produced an instance of a Factor. In this case, since the underlying data is of type string, latest produces a Classifier.
Similarly, a computation producing the latest Morningstar sector code of a security is a Classifier. In this case, the underlying type is an int, but the integer doesn't represent a numerical value (it's a category) so it produces a classifier. To get the latest sector code, we can use the built-in Sector classifier.
In [3]:
from quantopian.pipeline.classifiers.morningstar import Sector
morningstar_sector = Sector()
Using Sector is equivalent to morningstar.asset_classification.morningstar_sector_code.latest.
Classifiers can also be used to produce filters with methods like isnull, eq, and startswith. The full list of Classifier methods producing Filters can be found here.
As an example, if we wanted a filter to select for securities trading on the New York Stock Exchange, we can use the eq method of our exchange classifier.
In [4]:
nyse_filter = exchange.eq('NYS')
This filter will return True for securities having 'NYS' as their most recent exchange_id.
Classifiers can also be produced from various Factor methods. The most general of these is the quantiles method which accepts a bin count as an argument. The quantiles method assigns a label from 0 to (bins - 1) to every non-NaN data point in the factor output and returns a Classifier with these labels. NaNs are labeled with -1. Aliases are available for quartiles (quantiles(4)), quintiles (quantiles(5)), and deciles (quantiles(10)). As an example, this is what a filter for the top decile of a factor might look like:
In [5]:
dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()
top_decile = (dollar_volume_decile.eq(9))
Let's put each of our classifiers into a pipeline and run it to see what they look like.
In [6]:
def make_pipeline():
exchange = morningstar.share_class_reference.exchange_id.latest
nyse_filter = exchange.eq('NYS')
morningstar_sector = Sector()
dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()
top_decile = (dollar_volume_decile.eq(9))
return Pipeline(
columns={
'exchange': exchange,
'sector_code': morningstar_sector,
'dollar_volume_decile': dollar_volume_decile
},
screen=(nyse_filter & top_decile)
)
In [7]:
result = run_pipeline(make_pipeline(), '2015-05-05', '2015-05-05')
print 'Number of securities that passed the filter: %d' % len(result)
result.head(5)
Out[7]:
Classifiers are also useful for describing grouping keys for complex transformations on Factor outputs. Grouping operations such as demean and groupby are outside the scope of this tutorial. A future tutorial will cover more advanced uses for classifiers.
In the next lesson, we'll look at the different datasets that we can use in pipeline.