A factor is a function from an asset and a moment in time to a number.
F(asset, timestamp) -> float
In Pipeline, Factors are the most commonly-used term, representing the result of any computation producing a numerical result. Factors require a column of data and a window length as input.
The simplest factors in Pipeline are built-in Factors. Built-in Factors are pre-built to perform common computations. As a first example, let's make a factor to compute the average close price over the last 10 days. We can use the SimpleMovingAverage
built-in factor which computes the average value of the input data (close price) over the specified window length (10 days). To do this, we need to import our built-in SimpleMovingAverage
factor and the USEquityPricing dataset.
In [1]:
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
# New from the last lesson, import the USEquityPricing dataset.
from quantopian.pipeline.data.builtin import USEquityPricing
# New from the last lesson, import the built-in SimpleMovingAverage factor.
from quantopian.pipeline.factors import SimpleMovingAverage
Let's go back to our make_pipeline
function from the previous lesson and instantiate a SimpleMovingAverage
factor. To create a SimpleMovingAverage
factor, we can call the SimpleMovingAverage
constructor with two arguments: inputs, which must be a list of BoundColumn
objects, and window_length, which must be an integer indicating how many days worth of data our moving average calculation should receive. (We'll discuss BoundColumn
in more depth later; for now we just need to know that a BoundColumn
is an object indicating what kind of data should be passed to our Factor.).
The following line creates a Factor
for computing the 10-day mean close price of securities.
In [2]:
mean_close_10 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=10)
It's important to note that creating the factor does not actually perform a computation. Creating a factor is like defining the function. To perform a computation, we need to add the factor to our pipeline and run it.
Let's update our original empty pipeline to make it compute our new moving average factor. To start, let's move our factor instantatiation into make_pipeline
. Next, we can tell our pipeline to compute our factor by passing it a columns
argument, which should be a dictionary mapping column names to factors, filters, or classifiers. Our updated make_pipeline
function should look something like this:
In [3]:
def make_pipeline():
mean_close_10 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=10)
return Pipeline(
columns={
'10_day_mean_close': mean_close_10
}
)
To see what this looks like, let's make our pipeline, run it, and display the result.
In [4]:
result = run_pipeline(make_pipeline(), '2015-05-05', '2015-05-05')
result
Out[4]:
Now we have a column in our pipeline output with the 10-day average close price for all 8000+ securities (display truncated). Note that each row corresponds to the result of our computation for a given security on a given date stored. The DataFrame
has a MultiIndex where the first level is a datetime representing the date of the computation and the second level is an Equity object corresponding to the security. For example, the first row (2015-05-05 00:00:00+00:00
, Equity(2 [AA])
) will contain the result of our mean_close_10
factor for AA on May 5th, 2015.
If we run our pipeline over more than one day, the output looks like this.
In [5]:
result = run_pipeline(make_pipeline(), '2015-05-05', '2015-05-07')
result
Out[5]:
Note: factors can also be added to an existing Pipeline
instance using the Pipeline.add
method. Using add
looks something like this:
>>> my_pipe = Pipeline()
>>> f1 = SomeFactor(...)
>>> my_pipe.add(f1)
The most commonly used built-in Factor
is Latest
. The Latest
factor gets the most recent value of a given data column. This factor is common enough that it is instantiated differently from other factors. The best way to get the latest value of a data column is by getting its .latest
attribute. As an example, let's update make_pipeline
to create a latest close price factor and add it to our pipeline:
In [6]:
def make_pipeline():
mean_close_10 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=10)
latest_close = USEquityPricing.close.latest
return Pipeline(
columns={
'10_day_mean_close': mean_close_10,
'latest_close_price': latest_close
}
)
And now, when we make and run our pipeline again, there are two columns in our output dataframe. One column has the 10-day mean close price of each security, and the other has the latest close price.
In [7]:
result = run_pipeline(make_pipeline(), '2015-05-05', '2015-05-05')
result.head(5)
Out[7]:
.latest
can sometimes return things other than Factors
. We'll see examples of other possible return types in later lessons.
Some factors have default inputs that should never be changed. For example the VWAP built-in factor is always calculated from USEquityPricing.close
and USEquityPricing.volume
. When a factor is always calculated from the same BoundColumns
, we can call the constructor without specifying inputs
.
In [8]:
from quantopian.pipeline.factors import VWAP
vwap = VWAP(window_length=10)
In the next lesson, we will look at combining factors.