In [1]:
from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import AverageDollarVolume, SimpleMovingAverage
from quantopian.research import run_pipeline

Classifiers

A classifier is a function from an asset and a moment in time to a categorical output such as a string or integer label:

F(asset, timestamp) -> category

An example of a classifier producing a string output is the currency in which a security is traded. To create this classifier, we'll have to import Fundamentals.curn_doc_af and use the latest attribute to instantiate our classifier:


In [2]:
from quantopian.pipeline.data.factset import Fundamentals

# Since the underlying data of Fundamentals.curn_doc_af is of type
# string, .latest returns a Classifier.
currency = Fundamentals.curn_doc_af.latest

Previously, we saw that the latest attribute produced an instance of a Factor. In this case, since the underlying data is of type string, latest produces a Classifier.

Similarly, the underlying type of a Classifier could in theory be an int, where the integer doesn't actually represent a numerical value, but rather a categorical value.

Building Filters from Classifiers

Classifiers can also be used to produce filters with methods like isnull, eq, and startswith. The full list of Classifier methods producing Filters can be found here.

As an example, if we wanted a filter to select securities trading in the United States Dollar, we can use the eq method of our Fundamentals.curn_doc_af classifier.


In [3]:
us_currency_filter = currency.eq('USD')

This filter will return True for securities having 'USD' as their primary trading currency.

Quantiles

Classifiers can also be produced from various Factor methods. The most general of these is the quantiles method which accepts a bin count as an argument. The quantiles method assigns a label from 0 to (bins - 1) to every non-NaN data point in the factor output and returns a Classifier with these labels. NaNs are labeled with -1. Aliases are available for quartiles (quantiles(4)), quintiles (quantiles(5)), and deciles (quantiles(10)). As an example, this is what a filter for the top decile of a factor might look like:


In [4]:
dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()
top_decile = (dollar_volume_decile.eq(9))

Let's put each of our classifiers into a pipeline and run it to see what they look like.


In [5]:
def make_pipeline():
    currency = Fundamentals.curn_doc_af.latest
    us_currency_filter = currency.eq('USD')

    dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()
    top_decile = (dollar_volume_decile.eq(9))
    
    return Pipeline(
        columns={
            'currency': currency,
            'dollar_volume_decile': dollar_volume_decile,
        },
        screen=(us_currency_filter & top_decile),
    )

In [6]:
result = run_pipeline(make_pipeline(), '2015-05-05', '2015-05-05')
print 'Number of securities that passed the filter: %d' % len(result)
result.head(5)


Number of securities that passed the filter: 647
Out[6]:
currency dollar_volume_decile
2015-05-05 00:00:00+00:00 Equity(2 [ARNC]) USD 9
Equity(24 [AAPL]) USD 9
Equity(62 [ABT]) USD 9
Equity(64 [ABX]) USD 9
Equity(67 [ADSK]) USD 9

Classifiers are also useful for describing grouping keys for complex transformations on Factor outputs. Grouping operations such as demean and groupby are outside the scope of this tutorial. A future tutorial will cover more advanced uses for classifiers.

In the next lesson, we'll look at the different datasets that we can use in pipeline.