In [1]:
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import SimpleMovingAverage, AverageDollarVolume

Custom Factors

When we first looked at factors, we explored the set of built-in factors. Frequently, a desired computation isn't included as a built-in factor. One of the most powerful features of the Pipeline API is that it allows us to define our own custom factors. When a desired computation doesn't exist as a built-in, we define a custom factor.

Conceptually, a custom factor is identical to a built-in factor. It accepts inputs, window_length, and mask as constructor arguments, and returns a Factor object each day.

Let's take an example of a computation that doesn't exist as a built-in: standard deviation. To create a factor that computes the standard deviation over a trailing window, we can subclass quantopian.pipeline.CustomFactor and implement a compute method whose signature is:

def compute(self, today, asset_ids, out, *inputs):
    ...
  • *inputs are M x N numpy arrays, where M is the window_length and N is the number of securities (usually around ~8000 unless a mask is provided). *inputs are trailing data windows. Note that there will be one M x N array for each BoundColumn provided in the factor's inputs list. The data type of each array will be the dtype of the corresponding BoundColumn.
  • out is an empty array of length N. out will be the output of our custom factor each day. The job of compute is to write output values into out.
  • asset_ids will be an integer array of length N containing security ids corresponding to the columns in our *inputs arrays.
  • today will be a pandas Timestamp representing the day for which compute is being called.

Of these, *inputs and out are most commonly used.

An instance of CustomFactor that’s been added to a pipeline will have its compute method called every day. For example, let's define a custom factor that computes the standard deviation of the close price over the last 5 days. To start, let's add CustomFactor and numpy to our import statements.


In [2]:
from quantopian.pipeline import CustomFactor
import numpy

Next, let's define our custom factor to calculate the standard deviation over a trailing window using numpy.nanstd:


In [3]:
class StdDev(CustomFactor):
    def compute(self, today, asset_ids, out, values):
        # Calculates the column-wise standard deviation, ignoring NaNs
        out[:] = numpy.nanstd(values, axis=0)

Finally, let's instantiate our factor in make_pipeline():


In [4]:
def make_pipeline():
    std_dev = StdDev(inputs=[USEquityPricing.close], window_length=5)

    return Pipeline(
        columns={
            'std_dev': std_dev
        }
    )

When this pipeline is run, StdDev.compute() will be called every day with data as follows:

  • values: An M x N numpy array, where M is 5 (window_length), and N is ~8000 (the number of securities in our database on the day in question).
  • out: An empty array of length N (~8000). In this example, the job of compute is to populate out with an array storing of 5-day close price standard deviations.

In [5]:
result = run_pipeline(make_pipeline(), '2015-05-05', '2015-05-05')
result


/usr/local/lib/python2.7/dist-packages/numpy/lib/nanfunctions.py:1057: RuntimeWarning: Degrees of freedom <= 0 for slice.
  warnings.warn("Degrees of freedom <= 0 for slice.", RuntimeWarning)
Out[5]:
std_dev
2015-05-05 00:00:00+00:00 Equity(2 [AA]) 0.293428
Equity(21 [AAME]) 0.004714
Equity(24 [AAPL]) 1.737677
Equity(25 [AA_PR]) 0.275000
Equity(31 [ABAX]) 4.402971
Equity(39 [DDC]) 0.138939
Equity(41 [ARCB]) 0.826109
Equity(52 [ABM]) 0.093680
Equity(53 [ABMD]) 1.293058
Equity(62 [ABT]) 0.406546
Equity(64 [ABX]) 0.178034
Equity(66 [AB]) 0.510427
Equity(67 [ADSK]) 1.405754
Equity(69 [ACAT]) 0.561413
Equity(70 [VBF]) 0.054626
Equity(76 [TAP]) 0.411757
Equity(84 [ACET]) 0.320624
Equity(86 [ACG]) 0.012806
Equity(88 [ACI]) 0.026447
Equity(100 [IEP]) 0.444189
Equity(106 [ACU]) 0.060531
Equity(110 [ACXM]) 0.485444
Equity(112 [ACY]) 0.207107
Equity(114 [ADBE]) 0.280385
Equity(117 [AEY]) 0.022471
Equity(122 [ADI]) 0.549778
Equity(128 [ADM]) 0.605495
Equity(134 [SXCL]) NaN
Equity(149 [ADX]) 0.072153
Equity(153 [AE]) 3.676240
... ...
Equity(48961 [NYMT_O]) NaN
Equity(48962 [CSAL]) 0.285755
Equity(48963 [PAK]) 0.034871
Equity(48969 [NSA]) 0.144305
Equity(48971 [BSM]) 0.245000
Equity(48972 [EVA]) 0.207175
Equity(48981 [APIC]) 0.364560
Equity(48989 [UK]) 0.148399
Equity(48990 [ACWF]) 0.000000
Equity(48991 [ISCF]) 0.035000
Equity(48992 [INTF]) 0.000000
Equity(48993 [JETS]) 0.294937
Equity(48994 [ACTX]) 0.091365
Equity(48995 [LRGF]) 0.172047
Equity(48996 [SMLF]) 0.245130
Equity(48997 [VKTX]) 0.065000
Equity(48998 [OPGN]) NaN
Equity(48999 [AAPC]) 0.000000
Equity(49000 [BPMC]) 0.000000
Equity(49001 [CLCD]) NaN
Equity(49004 [TNP_PRD]) 0.000000
Equity(49005 [ARWA_U]) NaN
Equity(49006 [BVXV]) NaN
Equity(49007 [BVXV_W]) NaN
Equity(49008 [OPGN_W]) NaN
Equity(49009 [PRKU]) NaN
Equity(49010 [TBRA]) NaN
Equity(49131 [OESX]) NaN
Equity(49259 [ITUS]) NaN
Equity(49523 [TLGT]) NaN

8236 rows × 1 columns

Default Inputs

When writing a custom factor, we can set default inputs and window_length in our CustomFactor subclass. For example, let's define the TenDayMeanDifference custom factor to compute the mean difference between two data columns over a trailing window using numpy.nanmean. Let's set the default inputs to [USEquityPricing.close, USEquityPricing.open] and the default window_length to 10:


In [6]:
class TenDayMeanDifference(CustomFactor):
    # Default inputs.
    inputs = [USEquityPricing.close, USEquityPricing.open]
    window_length = 10
    def compute(self, today, asset_ids, out, close, open):
        # Calculates the column-wise mean difference, ignoring NaNs
        out[:] = numpy.nanmean(close - open, axis=0)

Remember in this case that `close` and `open` are each 10 x ~8000 2D [numpy arrays.](http://docs.scipy.org/doc/numpy-1.10.1/reference/arrays.ndarray.html)

If we call TenDayMeanDifference without providing any arguments, it will use the defaults.


In [7]:
# Computes the 10-day mean difference between the daily open and close prices.
close_open_diff = TenDayMeanDifference()

The defaults can be manually overridden by specifying arguments in the constructor call.


In [8]:
# Computes the 10-day mean difference between the daily high and low prices.
high_low_diff = TenDayMeanDifference(inputs=[USEquityPricing.high, USEquityPricing.low])

Further Example

Let's take another example where we build a momentum custom factor and use it to create a filter. We will then use that filter as a screen for our pipeline.

Let's start by defining a Momentum factor to be the division of the most recent close price by the close price from n days ago where n is the window_length.


In [9]:
class Momentum(CustomFactor):
    # Default inputs
    inputs = [USEquityPricing.close]

    # Compute momentum
    def compute(self, today, assets, out, close):
        out[:] = close[-1] / close[0]

Now, let's instantiate our Momentum factor (twice) to create a 10-day momentum factor and a 20-day momentum factor. Let's also create a positive_momentum filter returning True for securities with both a positive 10-day momentum and a positive 20-day momentum.


In [10]:
ten_day_momentum = Momentum(window_length=10)
twenty_day_momentum = Momentum(window_length=20)

positive_momentum = ((ten_day_momentum > 1) & (twenty_day_momentum > 1))

Next, let's add our momentum factors and our positive_momentum filter to make_pipeline. Let's also pass positive_momentum as a screen to our pipeline.


In [11]:
def make_pipeline():

    ten_day_momentum = Momentum(window_length=10)
    twenty_day_momentum = Momentum(window_length=20)

    positive_momentum = ((ten_day_momentum > 1) & (twenty_day_momentum > 1))

    std_dev = StdDev(inputs=[USEquityPricing.close], window_length=5)

    return Pipeline(
        columns={
            'std_dev': std_dev,
            'ten_day_momentum': ten_day_momentum,
            'twenty_day_momentum': twenty_day_momentum
        },
        screen=positive_momentum
    )

Running this pipeline outputs the standard deviation and each of our momentum computations for securities with positive 10-day and 20-day momentum.


In [12]:
result = run_pipeline(make_pipeline(), '2015-05-05', '2015-05-05')
result


Out[12]:
std_dev ten_day_momentum twenty_day_momentum
2015-05-05 00:00:00+00:00 Equity(2 [AA]) 0.293428 1.036612 1.042783
Equity(24 [AAPL]) 1.737677 1.014256 1.021380
Equity(39 [DDC]) 0.138939 1.062261 1.167319
Equity(52 [ABM]) 0.093680 1.009212 1.015075
Equity(64 [ABX]) 0.178034 1.025721 1.065587
Equity(66 [AB]) 0.510427 1.036137 1.067545
Equity(100 [IEP]) 0.444189 1.008820 1.011385
Equity(114 [ADBE]) 0.280385 1.016618 1.002909
Equity(117 [AEY]) 0.022471 1.004167 1.025532
Equity(128 [ADM]) 0.605495 1.049625 1.044832
Equity(149 [ADX]) 0.072153 1.004607 1.016129
Equity(154 [AEM]) 0.634920 1.032690 1.065071
Equity(161 [AEP]) 0.458938 1.024926 1.017563
Equity(166 [AES]) 0.164973 1.031037 1.045946
Equity(168 [AET]) 1.166938 1.007566 1.022472
Equity(192 [ATAX]) 0.024819 1.009025 1.018215
Equity(197 [AGCO]) 0.646594 1.066522 1.098572
Equity(239 [AIG]) 0.710307 1.027189 1.058588
Equity(253 [AIR]) 0.156844 1.007474 1.003818
Equity(266 [AJG]) 0.397769 1.000839 1.018799
Equity(312 [ALOT]) 0.182893 1.031780 1.021352
Equity(328 [ALTR]) 2.286573 1.041397 1.088996
Equity(353 [AME]) 0.362513 1.023622 1.004902
Equity(357 [TWX]) 0.502816 1.022013 1.006976
Equity(366 [AVD]) 0.842249 1.114111 1.093162
Equity(438 [AON]) 0.881295 1.020732 1.018739
Equity(448 [APA]) 0.678899 1.002193 1.051258
Equity(451 [APB]) 0.081240 1.026542 1.105042
Equity(455 [APC]) 0.152394 1.012312 1.097284
Equity(474 [APOG]) 0.610410 1.030843 1.206232
... ... ... ...
Equity(48504 [ERUS]) 0.052688 1.030893 1.052812
Equity(48531 [VSTO]) 0.513443 1.029164 1.028110
Equity(48532 [ENTL]) 0.163756 1.043708 1.152246
Equity(48535 [ANH_PRC]) 0.072388 1.010656 1.010656
Equity(48543 [SHAK]) 2.705316 1.262727 1.498020
Equity(48591 [SPYB]) 0.221848 1.001279 1.005801
Equity(48602 [ITEK]) 0.177042 1.213693 1.133721
Equity(48623 [TCCB]) 0.056148 1.003641 1.006349
Equity(48641 [GDJJ]) 0.530298 1.041176 1.111809
Equity(48644 [GDXX]) 0.401079 1.042319 1.120948
Equity(48680 [RODM]) 0.080455 1.005037 1.018853
Equity(48688 [QVM]) 0.152245 1.009996 1.021845
Equity(48701 [AMT_PRB]) 0.546691 1.010356 1.023537
Equity(48706 [GBSN_U]) 0.442285 1.214035 1.272059
Equity(48730 [AGN_PRA]) 9.614542 1.000948 1.001694
Equity(48746 [SUM]) 0.457585 1.024112 1.131837
Equity(48747 [AFTY]) 0.193080 1.032030 1.146784
Equity(48754 [IBDJ]) 0.048949 1.000161 1.000561
Equity(48768 [SDEM]) 0.102439 1.068141 1.103535
Equity(48783 [CHEK_W]) 0.222528 1.466667 1.157895
Equity(48785 [NCOM]) 0.166885 1.018349 1.020221
Equity(48792 [AFSI_PRD]) 0.062426 1.001572 1.008307
Equity(48804 [TANH]) 0.620471 1.179510 1.381538
Equity(48809 [AIC]) 0.027276 1.000399 1.008857
Equity(48821 [CJES]) 0.851751 1.220506 1.335895
Equity(48822 [CLLS]) 0.230596 1.014299 1.023526
Equity(48823 [SEDG]) 1.228733 1.207086 1.234685
Equity(48853 [SGDJ]) 0.381209 1.026782 1.060795
Equity(48863 [GDDY]) 0.453669 1.046755 1.029738
Equity(48875 [HBHC_L]) 0.025687 1.001746 1.005010

2773 rows × 3 columns

Custom factors allow us to define custom computations in a pipeline. They are frequently the best way to perform computations on partner datasets or on multiple data columns. The full documentation for CustomFactors is available here.

In the next lesson, we'll use everything we've learned so far to create a pipeline for an algorithm.