Accern: Alpha One

In this notebook, we'll take a look at Accern's Alpha One dataset, available on Quantopian. This dataset spans August 26, 2012 through the current day. It contains comprehensive sentiment analysis related to equities from 20 million+ sources.

Notebook Contents

There are two ways to access the data and you'll find both of them listed below. Just click on the section you'd like to read through.

Interactive overview: This is only available on Research and uses blaze to give you access to large amounts of data. Recommended for exploration and plotting.
Pipeline overview: Data is made available through pipeline which is available on both the Research & Backtesting environment. Recommended for custom factor development and moving back & forth between research/backtesting.

Free samples and limits

One key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.

There is a free version of this dataset as well as a paid one. The free sample includes data until 2 months prior to the current date.

To access the most up-to-date values for this data set for trading a live algorithm (as with other partner sets), you need to purchase acess to the full set.

With preamble in place, let's get started:

Interactive Overview

Accessing the data with Blaze and Interactive on Research

Partner datasets are available on Quantopian Research through an API service known as Blaze. Blaze provides the Quantopian user with a convenient interface to access very large datasets, in an interactive, generic manner.

Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.

It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.

Helpful links:

Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:

from odo import odo
odo(expr, pandas.DataFrame)

To see how this data can be used in your algorithm, search for the `Pipeline Overview` section of this notebook or head straight to Pipeline Overview



In [15]:

    
# import the dataset
from quantopian.interactive.data.accern import alphaone_free as dataset

# or if you want to import the paid dataset, use:
# from quantopian.interactive.data.accern import alphaone

# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd
import matplotlib.pyplot as plt



In [3]:

    
# Let's use blaze to understand the data a bit using Blaze dshape()
dataset.dshape









    Out[3]:





dshape("""var * {
  symbol: string,
  name: ?string,
  article_sentiment: float64,
  impact_score: float64,
  sid: int64,
  asof_date: datetime,
  timestamp: datetime
  }""")



In [4]:

    
# And how many rows are there?
# N.B. we're using a Blaze function to do this, not len()
dataset.count()









    Out[4]:




591697



In [5]:

    
# Let's see what the data looks like. We'll grab the first three rows.
dataset[:3]









    Out[5]:





  
    
      
      symbol
      name
      article_sentiment
      impact_score
      sid
      asof_date
      timestamp
    
  
  
    
      0
      AAPL
      Apple Inc.
      -0.194
      70.495
      24
      2012-08-26
      2012-08-26
    
    
      1
      GRPN
      Groupon, Inc.
      -0.167
      79.000
      42118
      2012-08-26
      2012-08-26
    
    
      2
      TWTR
      Twitter, Inc.
      -0.473
      29.362
      19171
      2012-08-26
      2012-08-26

Let's go over the columns:

symbol: the ticker symbol of the company.
name: the name of the company.
asof_date: Alpha One's timestamp of delivery of article_sentiment.
article_sentiment: a score in [-1,1] reflecting the sentiment of articles written about the company in the last day. The higher score, the more positive the outlook.
timestamp: this is our timestamp on when we registered the data.
impact_score: on [0,100], this is the probability that the stock price will change by more than 1% (given by: close - open / open) on the next trading day.
sid: the equity's unique identifier. Use this instead of ticker or name.

We've done much of the data processing for you. Fields like timestamp and sid are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the sid across all our equity databases.

We can select columns and rows with ease. Below, we'll fetch all entries for the sid 24, which identifies AAPL. We're really only interested in the timestamp, article sentiment, and the impact score, so we'll display only those.



In [7]:

    
aapl = dataset[dataset.sid==24][['asof_date','article_sentiment', 'impact_score']].sort('asof_date')
# Whenever you print a Blaze Data Object, it will be truncated to ten rows.
aapl









    Out[7]:





  
    
      
      asof_date
      article_sentiment
      impact_score
    
  
  
    
      0
      2012-08-26
      -0.194
      70.495
    
    
      1
      2012-08-27
      -0.126
      70.086
    
    
      2
      2012-08-28
      -0.367
      70.100
    
    
      3
      2012-08-29
      -0.219
      70.436
    
    
      4
      2012-08-30
      -0.071
      70.738
    
    
      5
      2012-08-31
      -0.167
      70.417
    
    
      6
      2012-09-01
      -0.392
      71.379
    
    
      7
      2012-09-02
      -0.270
      70.664
    
    
      8
      2012-09-03
      -0.320
      70.570
    
    
      9
      2012-09-04
      -0.361
      70.746
    
    
      10
      2012-09-05
      -0.295
      70.631

You can see that Accern collects data for every day, not just for every business day.

Let's convert aapl to a DataFrame.



In [10]:

    
aapl_sentiment = odo(aapl, pd.DataFrame)
# let's say we only want one year of data.
aapl_sentiment = aapl_sentiment[aapl_sentiment['asof_date'] <= '2013-08-26']
# suppose we want the rows to be indexed by timestamp.
aapl_sentiment.index = list(aapl_sentiment['asof_date'])
aapl_sentiment.drop('asof_date',1,inplace=True)
# display the first three rows. DataFrames, when printed, display 60 rows at a time.
aapl_sentiment[:3]









    Out[10]:






  
    
      
      article_sentiment
      impact_score
    
  
  
    
      2012-08-26
      -0.194
      70.495
    
    
      2012-08-27
      -0.126
      70.086
    
    
      2012-08-28
      -0.367
      70.100

Let's take a look at the sentiment in a chart



In [11]:

    
aapl_sentiment.plot(y='article_sentiment')









    Out[11]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f4a905f0650>

Pipeline Overview

Accessing the data in your algorithms & research

The only method for accessing partner data within algorithms running on Quantopian is via the pipeline API. Different data sets work differently but in the case of this PsychSignal data, you can add this data to your pipeline as follows:

Import the data set here

from quantopian.pipeline.data.accern import alphaone

Then in intialize() you could do something simple like adding the raw value of one of the fields to your pipeline:

pipe.add(alphaone.impact_score.latest, 'impact_score')



In [1]:

    
# Import necessary Pipeline modules
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.factors import AverageDollarVolume



In [2]:

    
# For use in your algorithms
# Using the full paid dataset in your pipeline algo
# from quantopian.pipeline.data.accern import alphaone

# Using the free sample in your pipeline algo
from quantopian.pipeline.data.accern import alphaone_free

Now that we've imported the data, let's take a look at which fields are available for each dataset.

You'll find the dataset, the available fields, and the datatypes for each of those fields.



In [3]:

    
print "Here are the list of available fields per dataset:"
print "---------------------------------------------------\n"

def _print_fields(dataset):
    print "Dataset: %s\n" % dataset.__name__
    print "Fields:"
    for field in list(dataset.columns):
        print "%s - %s" % (field.name, field.dtype)
    print "\n"

for data in (alphaone_free,):
    _print_fields(data)


print "---------------------------------------------------\n"









    



Here are the list of available fields per dataset:
---------------------------------------------------

Dataset: alphaone_free

Fields:
article_sentiment - float64
impact_score - float64


---------------------------------------------------

Now that we know what fields we have access to, let's see what this data looks like when we run it through Pipeline.

This is constructed the same way as you would in the backtester. For more information on using Pipeline in Research view this thread: https://www.quantopian.com/posts/pipeline-in-research-build-test-and-visualize-your-factors-and-filters



In [4]:

    
# Let's see what this data looks like when we run it through Pipeline
# This is constructed the same way as you would in the backtester. For more information
# on using Pipeline in Research view this thread:
# https://www.quantopian.com/posts/pipeline-in-research-build-test-and-visualize-your-factors-and-filters
pipe = Pipeline()

pipe.add(alphaone_free.impact_score.latest, 'impact_score')
pipe.add(alphaone_free.article_sentiment.latest, 'article_sentiment')



In [5]:

    
# Setting some basic liquidity strings (just for good habit)
dollar_volume = AverageDollarVolume(window_length=20)
top_1000_most_liquid = dollar_volume.rank(ascending=False) < 1000

pipe.set_screen(top_1000_most_liquid & alphaone_free.article_sentiment.latest.notnan())



In [6]:

    
# The show_graph() method of pipeline objects produces a graph to show how it is being calculated.
pipe.show_graph(format='png')









    Out[6]:



In [7]:

    
# run_pipeline will show the output of your pipeline
pipe_output = run_pipeline(pipe, start_date='2013-11-01', end_date='2013-11-25')
pipe_output









    Out[7]:






  
    
      
      
      article_sentiment
      impact_score
    
  
  
    
      2013-11-01 00:00:00+00:00
      Equity(2 [AA])
      0.000
      76.000
    
    
      Equity(24 [AAPL])
      0.001
      73.065
    
    
      Equity(62 [ABT])
      -0.392
      77.271
    
    
      Equity(64 [ABX])
      0.000
      96.000
    
    
      Equity(67 [ADSK])
      1.000
      88.333
    
    
      Equity(76 [TAP])
      -1.000
      80.000
    
    
      Equity(88 [ACI])
      -0.190
      84.143
    
    
      Equity(114 [ADBE])
      -0.341
      76.354
    
    
      Equity(122 [ADI])
      0.000
      88.000
    
    
      Equity(128 [ADM])
      -0.333
      88.667
    
    
      Equity(154 [AEM])
      0.000
      100.000
    
    
      Equity(161 [AEP])
      0.000
      86.000
    
    
      Equity(166 [AES])
      0.000
      70.000
    
    
      Equity(168 [AET])
      0.708
      84.500
    
    
      Equity(185 [AFL])
      -0.405
      86.000
    
    
      Equity(197 [AGCO])
      1.000
      84.000
    
    
      Equity(216 [HES])
      0.500
      87.200
    
    
      Equity(239 [AIG])
      0.083
      81.667
    
    
      Equity(273 [ALU])
      0.000
      92.000
    
    
      Equity(300 [ALK])
      0.000
      69.000
    
    
      Equity(328 [ALTR])
      0.000
      82.000
    
    
      Equity(337 [AMAT])
      1.000
      100.000
    
    
      Equity(351 [AMD])
      -0.061
      78.727
    
    
      Equity(353 [AME])
      0.000
      89.875
    
    
      Equity(357 [TWX])
      0.000
      92.727
    
    
      Equity(368 [AMGN])
      0.000
      79.000
    
    
      Equity(410 [AN])
      0.000
      77.000
    
    
      Equity(438 [AON])
      0.000
      92.333
    
    
      Equity(448 [APA])
      0.000
      85.000
    
    
      Equity(455 [APC])
      0.714
      88.571
    
    
      ...
      ...
      ...
      ...
    
    
      2013-11-25 00:00:00+00:00
      Equity(42950 [FB])
      0.358
      69.945
    
    
      Equity(43127 [NOW])
      1.000
      94.000
    
    
      Equity(43399 [ADT])
      -1.000
      70.000
    
    
      Equity(43405 [KRFT])
      0.000
      63.000
    
    
      Equity(43413 [TRLA])
      1.000
      84.000
    
    
      Equity(43414 [SRC])
      0.000
      84.667
    
    
      Equity(43500 [RLGY])
      0.000
      100.000
    
    
      Equity(43510 [WDAY])
      0.000
      90.000
    
    
      Equity(43512 [FANG])
      -1.000
      53.000
    
    
      Equity(43647 [YY])
      -1.000
      88.000
    
    
      Equity(43694 [ABBV])
      0.000
      71.000
    
    
      Equity(43721 [SCTY])
      -0.500
      100.000
    
    
      Equity(43919 [LMCA])
      -1.000
      66.000
    
    
      Equity(44060 [ZTS])
      0.000
      84.000
    
    
      Equity(44102 [XONE])
      0.000
      91.000
    
    
      Equity(44645 [VOYA])
      1.000
      70.000
    
    
      Equity(44747 [DATA])
      0.000
      55.000
    
    
      Equity(44931 [NWSA])
      0.000
      37.000
    
    
      Equity(44965 [GOGO])
      -1.000
      88.000
    
    
      Equity(45656 [GLPI])
      0.000
      63.000
    
    
      Equity(45689 [VJET])
      0.000
      100.000
    
    
      Equity(45755 [BRX])
      0.000
      100.000
    
    
      Equity(45769 [WUBA])
      0.000
      83.000
    
    
      Equity(45780 [TCS])
      1.000
      66.000
    
    
      Equity(45781 [QUNR])
      -1.000
      83.000
    
    
      Equity(45815 [TWTR])
      -0.043
      30.354
    
    
      Equity(45848 [STAY])
      0.000
      90.000
    
    
      Equity(45866 [ZU])
      -1.000
      28.000
    
    
      Equity(45902 [WBAI])
      0.000
      81.000
    
    
      Equity(45906 [VNCE])
      0.167
      66.667
    
  

14482 rows × 2 columns

Taking what we've seen from above, let's see how we'd move that into the backtester.



In [11]:

    
# This section is only importable in the backtester
from quantopian.algorithm import attach_pipeline, pipeline_output

# General pipeline imports
from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import AverageDollarVolume

# Import the datasets available
# For use in your algorithms
# Using the full paid dataset in your pipeline algo
# from quantopian.pipeline.data.accern import alphaone

# Using the free sample in your pipeline algo
from quantopian.pipeline.data.accern import alphaone_free

def make_pipeline():
    # Create our pipeline
    pipe = Pipeline()
    
    # Screen out penny stocks and low liquidity securities.
    dollar_volume = AverageDollarVolume(window_length=20)
    is_liquid = dollar_volume.rank(ascending=False) < 1000
    
    # Create the mask that we will use for our percentile methods.
    base_universe = (is_liquid)

    # Add pipeline factors
    pipe.add(alphaone_free.impact_score.latest, 'impact_score')
    pipe.add(alphaone_free.article_sentiment.latest, 'article_sentiment')

    # Set our pipeline screens
    pipe.set_screen(is_liquid)
    return pipe

def initialize(context):
    attach_pipeline(make_pipeline(), "pipeline")
    
def before_trading_start(context, data):
    results = pipeline_output('pipeline')

Now you can take that and begin to use it as a building block for your algorithms, for more examples on how to do that you can visit our data pipeline factor library

	symbol	name	article_sentiment	impact_score	sid	asof_date	timestamp
0	AAPL	Apple Inc.	-0.194	70.495	24	2012-08-26	2012-08-26
1	GRPN	Groupon, Inc.	-0.167	79.000	42118	2012-08-26	2012-08-26
2	TWTR	Twitter, Inc.	-0.473	29.362	19171	2012-08-26	2012-08-26

		article_sentiment	impact_score
2013-11-01 00:00:00+00:00	Equity(2 [AA])	0.000	76.000
	Equity(24 [AAPL])	0.001	73.065
	Equity(62 [ABT])	-0.392	77.271
	Equity(64 [ABX])	0.000	96.000
	Equity(67 [ADSK])	1.000	88.333
	Equity(76 [TAP])	-1.000	80.000
	Equity(88 [ACI])	-0.190	84.143
	Equity(114 [ADBE])	-0.341	76.354
	Equity(122 [ADI])	0.000	88.000
	Equity(128 [ADM])	-0.333	88.667
	Equity(154 [AEM])	0.000	100.000
	Equity(161 [AEP])	0.000	86.000
	Equity(166 [AES])	0.000	70.000
	Equity(168 [AET])	0.708	84.500
	Equity(185 [AFL])	-0.405	86.000
	Equity(197 [AGCO])	1.000	84.000
	Equity(216 [HES])	0.500	87.200
	Equity(239 [AIG])	0.083	81.667
	Equity(273 [ALU])	0.000	92.000
	Equity(300 [ALK])	0.000	69.000
	Equity(328 [ALTR])	0.000	82.000
	Equity(337 [AMAT])	1.000	100.000
	Equity(351 [AMD])	-0.061	78.727
	Equity(353 [AME])	0.000	89.875
	Equity(357 [TWX])	0.000	92.727
	Equity(368 [AMGN])	0.000	79.000
	Equity(410 [AN])	0.000	77.000
	Equity(438 [AON])	0.000	92.333
	Equity(448 [APA])	0.000	85.000
	Equity(455 [APC])	0.714	88.571
...	...	...	...
2013-11-25 00:00:00+00:00	Equity(42950 [FB])	0.358	69.945
	Equity(43127 [NOW])	1.000	94.000
	Equity(43399 [ADT])	-1.000	70.000
	Equity(43405 [KRFT])	0.000	63.000
	Equity(43413 [TRLA])	1.000	84.000
	Equity(43414 [SRC])	0.000	84.667
	Equity(43500 [RLGY])	0.000	100.000
	Equity(43510 [WDAY])	0.000	90.000
	Equity(43512 [FANG])	-1.000	53.000
	Equity(43647 [YY])	-1.000	88.000
	Equity(43694 [ABBV])	0.000	71.000
	Equity(43721 [SCTY])	-0.500	100.000
	Equity(43919 [LMCA])	-1.000	66.000
	Equity(44060 [ZTS])	0.000	84.000
	Equity(44102 [XONE])	0.000	91.000
	Equity(44645 [VOYA])	1.000	70.000
	Equity(44747 [DATA])	0.000	55.000
	Equity(44931 [NWSA])	0.000	37.000
	Equity(44965 [GOGO])	-1.000	88.000
	Equity(45656 [GLPI])	0.000	63.000
	Equity(45689 [VJET])	0.000	100.000
	Equity(45755 [BRX])	0.000	100.000
	Equity(45769 [WUBA])	0.000	83.000
	Equity(45780 [TCS])	1.000	66.000
	Equity(45781 [QUNR])	-1.000	83.000
	Equity(45815 [TWTR])	-0.043	30.354
	Equity(45848 [STAY])	0.000	90.000
	Equity(45866 [ZU])	-1.000	28.000
	Equity(45902 [WBAI])	0.000	81.000
	Equity(45906 [VNCE])	0.167	66.667