Portfolio and Risk Analytics in Python with pyfolio

Dr. Thomas Wiecki

Lead Data Scientist

About me

Lead Data Scientist at Quantopian Inc.
PhD from Brown university: Bayesian models of brain dysfunction
Contributor to PyMC3: Probabilistic Programming in Python
Twitter: @twiecki

Why use Python for quant finance?

Python is a general purpose language -> No hodge-podge of perl, bash, matlab, fortran.
Very easy to learn.

The Quant Finance PyData Stack

Source: [Jake VanderPlas: State of the Tools](https://www.youtube.com/watch?v=5GlNDD7qbP4)

Python in Quantitative Finance

When Quantopian started in 2011, we needed a backtester:

-> Open-sourced Zipline in 2012

When we started to build a crowd-source hedge fund, we needed a better way to evaluate algorithms:

-> Open-sourced pyfolio in 2015

Announcing pyfolio

State-of-the-art portfolio and risk analytics
http://quantopian.github.io/pyfolio/
Open source and free: Apache v2 license
Can be used:
- stand alone
- with Zipline
- on Quantopian
- with PyThalesians

Using pyfolio stand-alone

Installation

Use Anaconda to get a Python system with the full PyData ecosystem.
pip install pyfolio



In [3]:

    
import pyfolio as pf
%matplotlib inline

Fetch the daily returns for a stock



In [5]:

    
stock_rets = pf.utils.get_symbol_rets('FB')
stock_rets.head()









    Out[5]:





Date
2012-05-21 00:00:00+00:00   -0.109861
2012-05-22 00:00:00+00:00   -0.089039
2012-05-23 00:00:00+00:00    0.032258
2012-05-24 00:00:00+00:00    0.032187
2012-05-25 00:00:00+00:00   -0.033909
Name: FB, dtype: float64

Tear sheets

Collection of tables and plots.

Various tear sheets based on:

returns
positions
transactions
periods of market stress
Bayesian analyses

To get an idea, here is a returns based tear sheet



In [6]:

    
pf.create_returns_tear_sheet(stock_rets)









    



Entire data start date: 2012-05-21
Entire data end date: 2015-10-28


Backtest Months: 41
                   Backtest
annual_return          0.38
annual_volatility      0.44
sharpe_ratio           0.88
calmar_ratio           0.80
stability              0.88
max_drawdown          -0.48
omega_ratio            1.18
sortino_ratio          1.42
skewness               1.74
kurtosis              19.32
alpha                  0.22
beta                   1.01

Worst Drawdown Periods
   net drawdown in %  peak date valley date recovery date duration
1              47.90 2012-05-21  2012-09-04    2013-07-25      309
2              22.06 2014-03-10  2014-04-28    2014-07-24       99
3              17.34 2013-10-18  2013-11-25    2013-12-17       43
0              16.57 2015-07-21  2015-08-24    2015-10-19       65
4               9.20 2015-03-24  2015-05-12    2015-06-23       66


2-sigma returns daily    -0.053
2-sigma returns weekly   -0.108
dtype: float64






    



/home/wiecki/miniconda3/lib/python3.4/site-packages/matplotlib/cbook.py:137: MatplotlibDeprecationWarning: The "loc" positional argument to legend is deprecated. Please use the "loc" keyword instead.
  warnings.warn(message, mplDeprecation, stacklevel=1)

Zipline + pyfolio

Open-source backtester by Quantopian Inc.
Powers Quantopian.com
Various models for transaction costs and slippage.



In [7]:

    
import numpy as np
import pandas as pd

import sys
import logbook
import numpy as np
from datetime import datetime
import pytz

# Import Zipline, the open source backtester
from zipline import TradingAlgorithm
from zipline.data.loader import load_bars_from_yahoo
from zipline.api import order_target, symbol, history, add_history, schedule_function, date_rules, time_rules
from zipline.algorithm import TradingAlgorithm
from zipline.utils.factory import load_from_yahoo
from zipline.finance import commission



In [8]:

    
# Zipline trading algorithm
# Taken from zipline.examples.olmar
zipline_logging = logbook.NestedSetup([
    logbook.NullHandler(level=logbook.DEBUG),
    logbook.StreamHandler(sys.stdout, level=logbook.INFO),
    logbook.StreamHandler(sys.stderr, level=logbook.ERROR),
])
zipline_logging.push_application()

STOCKS = ['AMD', 'CERN', 'COST', 'DELL', 'GPS', 'INTC', 'MMM']


# On-Line Portfolio Moving Average Reversion

# More info can be found in the corresponding paper:
# http://icml.cc/2012/papers/168.pdf
def initialize(algo, eps=1, window_length=5):
    algo.stocks = STOCKS
    algo.sids = [algo.symbol(symbol) for symbol in algo.stocks]
    algo.m = len(algo.stocks)
    algo.price = {}
    algo.b_t = np.ones(algo.m) / algo.m
    algo.last_desired_port = np.ones(algo.m) / algo.m
    algo.eps = eps
    algo.init = True
    algo.days = 0
    algo.window_length = window_length
    algo.add_transform('mavg', 5)

    algo.set_commission(commission.PerShare(cost=0))


def handle_data(algo, data):
    algo.days += 1
    if algo.days < algo.window_length:
        return

    if algo.init:
        rebalance_portfolio(algo, data, algo.b_t)
        algo.init = False
        return

    m = algo.m

    x_tilde = np.zeros(m)
    b = np.zeros(m)

    # find relative moving average price for each asset
    for i, sid in enumerate(algo.sids):
        price = data[sid].price
        # Relative mean deviation
        x_tilde[i] = data[sid].mavg(algo.window_length) / price

    ###########################
    # Inside of OLMAR (algo 2)
    x_bar = x_tilde.mean()

    # market relative deviation
    mark_rel_dev = x_tilde - x_bar

    # Expected return with current portfolio
    exp_return = np.dot(algo.b_t, x_tilde)
    weight = algo.eps - exp_return
    variability = (np.linalg.norm(mark_rel_dev)) ** 2

    # test for divide-by-zero case
    if variability == 0.0:
        step_size = 0
    else:
        step_size = max(0, weight / variability)

    b = algo.b_t + step_size * mark_rel_dev
    b_norm = simplex_projection(b)
    np.testing.assert_almost_equal(b_norm.sum(), 1)

    rebalance_portfolio(algo, data, b_norm)

    # update portfolio
    algo.b_t = b_norm


def rebalance_portfolio(algo, data, desired_port):
    # rebalance portfolio
    desired_amount = np.zeros_like(desired_port)
    current_amount = np.zeros_like(desired_port)
    prices = np.zeros_like(desired_port)

    if algo.init:
        positions_value = algo.portfolio.starting_cash
    else:
        positions_value = algo.portfolio.positions_value + \
            algo.portfolio.cash

    for i, sid in enumerate(algo.sids):
        current_amount[i] = algo.portfolio.positions[sid].amount
        prices[i] = data[sid].price

    desired_amount = np.round(desired_port * positions_value / prices)

    algo.last_desired_port = desired_port
    diff_amount = desired_amount - current_amount

    for i, sid in enumerate(algo.sids):
        algo.order(sid, diff_amount[i])


def simplex_projection(v, b=1):
    """Projection vectors to the simplex domain

    Implemented according to the paper: Efficient projections onto the
    l1-ball for learning in high dimensions, John Duchi, et al. ICML 2008.
    Implementation Time: 2011 June 17 by Bin@libin AT pmail.ntu.edu.sg
    Optimization Problem: min_{w}\| w - v \|_{2}^{2}
    s.t. sum_{i=1}^{m}=z, w_{i}\geq 0

    Input: A vector v \in R^{m}, and a scalar z > 0 (default=1)
    Output: Projection vector w

    :Example:
    >>> proj = simplex_projection([.4 ,.3, -.4, .5])
    >>> print(proj)
    array([ 0.33333333, 0.23333333, 0. , 0.43333333])
    >>> print(proj.sum())
    1.0

    Original matlab implementation: John Duchi (jduchi@cs.berkeley.edu)
    Python-port: Copyright 2013 by Thomas Wiecki (thomas.wiecki@gmail.com).
    """

    v = np.asarray(v)
    p = len(v)

    # Sort v into u in descending order
    v = (v > 0) * v
    u = np.sort(v)[::-1]
    sv = np.cumsum(u)

    rho = np.where(u > (sv - b) / np.arange(1, p + 1))[0][-1]
    theta = np.max([0, (sv[rho] - b) / (rho + 1)])
    w = (v - theta)
    w[w < 0] = 0
    return w

start = datetime(2004, 1, 1, 0, 0, 0, 0, pytz.utc)
end = datetime(2010, 1, 1, 0, 0, 0, 0, pytz.utc)
data = load_from_yahoo(stocks=STOCKS, indexes={}, start=start, end=end)
data = data.dropna()
olmar = TradingAlgorithm(handle_data=handle_data,
                         initialize=initialize,
                         identifiers=STOCKS)
backtest = olmar.run(data)









    



AMD
CERN
COST
DELL
GPS
INTC
MMM
[2015-10-29 15:03:52.783713] INFO: Performance: Simulated 1511 trading days out of 1511.
[2015-10-29 15:03:52.784246] INFO: Performance: first open: 2004-01-02 14:31:00+00:00
[2015-10-29 15:03:52.784635] INFO: Performance: last close: 2009-12-31 21:00:00+00:00

Converting data from zipline to pyfolio



In [9]:

    
returns, positions, transactions = \
    pf.utils.extract_rets_pos_txn_from_zipline(backtest)



In [10]:

    
positions.columns = STOCKS + ['cash']

Data structures used by pyfolio



In [11]:

    
returns.tail()









    Out[11]:





2009-12-24 00:00:00+00:00    0.000989
2009-12-28 00:00:00+00:00    0.008174
2009-12-29 00:00:00+00:00    0.007428
2009-12-30 00:00:00+00:00    0.002472
2009-12-31 00:00:00+00:00   -0.018984
Name: returns, dtype: float64



In [12]:

    
positions.tail()









    Out[12]:






  
    
      
      AMD
      CERN
      COST
      DELL
      GPS
      INTC
      MMM
      cash
    
    
      index
      
      
      
      
      
      
      
      
    
  
  
    
      2009-12-24 00:00:00+00:00
      0.00
      0.00000
      28116.291654
      0.00000
      86022.143403
      40522.266140
      17288.084400
      261.648675
    
    
      2009-12-28 00:00:00+00:00
      0.00
      0.00000
      26444.895165
      0.00000
      91778.156400
      38402.193192
      17039.652084
      -46.731967
    
    
      2009-12-29 00:00:00+00:00
      0.00
      0.00000
      18547.461768
      0.00000
      108639.828380
      47874.339058
      0.000000
      -153.847015
    
    
      2009-12-30 00:00:00+00:00
      3178.44
      0.00000
      31364.795616
      63294.65464
      53188.785870
      26018.584921
      0.000000
      -1705.100847
    
    
      2009-12-31 00:00:00+00:00
      44363.44
      20816.10101
      63315.443592
      18114.05960
      26346.752400
      0.000000
      0.000000
      -944.218331



In [13]:

    
transactions.tail()









    Out[13]:






  
    
      
      txn_volume
      txn_shares
    
  
  
    
      2009-12-24 00:00:00+00:00
      29064.589481
      1381
    
    
      2009-12-28 00:00:00+00:00
      8794.514054
      405
    
    
      2009-12-29 00:00:00+00:00
      49996.978324
      1765
    
    
      2009-12-30 00:00:00+00:00
      157158.874704
      9233
    
    
      2009-12-31 00:00:00+00:00
      189609.809608
      11946

Create all tear-sheets pyfolio has to offer



In [14]:

    
sector_map = {'AMD': 'Technology',
              'CERN': 'Technology',
              'DELL': 'Technology',
              'INTC': 'Technology',
              'COST': 'Services',
              'GPS': 'Services',
              'MMM': 'Industrial Goods'}



In [24]:

    
oos_date = '2009-10-21'

pf.create_full_tear_sheet(returns,
                          positions=positions,
                          transactions=transactions,
                          live_start_date=oos_date,
                          slippage=0.1,
                          sector_mappings=sector_map)









    



Entire data start date: 2004-01-09
Entire data end date: 2009-12-31


Out-of-Sample Months: 2
Backtest Months: 69
                   Backtest  Out_of_Sample  All_History
annual_return          0.12           0.16         0.12
annual_volatility      0.26           0.22         0.25
sharpe_ratio           0.48           0.74         0.48
calmar_ratio           0.21           2.23         0.21
stability              0.00           0.04         0.01
max_drawdown          -0.60          -0.07        -0.60
omega_ratio            1.09           1.13         1.09
sortino_ratio          0.71           1.04         0.71
skewness               0.28          -0.29         0.27
kurtosis               4.07           0.36         4.03
alpha                  0.09          -0.06         0.09
beta                   0.81           1.20         0.81

Worst Drawdown Periods
   net drawdown in %  peak date valley date recovery date duration
0              59.52 2007-11-06  2008-11-20           NaT      NaN
1              22.34 2006-02-16  2006-08-31    2007-05-21      328
2              12.52 2005-07-28  2005-10-12    2006-01-11      120
3              11.29 2004-11-15  2005-04-28    2005-07-28      184
4               9.44 2007-07-16  2007-08-06    2007-09-04       37


2-sigma returns daily    -0.032
2-sigma returns weekly   -0.065
dtype: float64






    



/home/wiecki/miniconda3/lib/python3.4/site-packages/matplotlib/cbook.py:137: MatplotlibDeprecationWarning: The "loc" positional argument to legend is deprecated. Please use the "loc" keyword instead.
  warnings.warn(message, mplDeprecation, stacklevel=1)






    












    



Stress Events
          mean    min    max
Lehmann -0.003 -0.044  0.044
Aug07    0.003 -0.030  0.030
Sept08  -0.006 -0.043  0.040
2009Q1  -0.004 -0.050  0.034
2009Q2   0.007 -0.038  0.062






    












    




Top 10 long positions of all time (and max%)
['COST' 'MMM' 'CERN' 'DELL' 'AMD' 'INTC' 'GPS']
[ 0.993  0.911  0.845  0.717  0.709  0.666  0.62 ]


Top 10 short positions of all time (and max%)
[]
[]


Top 10 positions of all time (and max%)
['COST' 'MMM' 'CERN' 'DELL' 'AMD' 'INTC' 'GPS']
[ 0.993  0.911  0.845  0.717  0.709  0.666  0.62 ]


All positions ever held
['COST' 'MMM' 'CERN' 'DELL' 'AMD' 'INTC' 'GPS']
[ 0.993  0.911  0.845  0.717  0.709  0.666  0.62 ]

Pyfolio can also be used as a library

Levels of API

Tear sheets call individual plotting functions in pyfolio.plotting
Plotting functions call individual statistical functions in pyfolio.timeseries



In [17]:

    
# Show overview of pyfolio.plotting submodule
[f for f in dir(pf.plotting) if 'plot_' in f]









    Out[17]:





['plot_annual_returns',
 'plot_daily_returns_similarity',
 'plot_daily_turnover_hist',
 'plot_daily_volume',
 'plot_drawdown_periods',
 'plot_drawdown_underwater',
 'plot_exposures',
 'plot_gross_leverage',
 'plot_holdings',
 'plot_monthly_returns_dist',
 'plot_monthly_returns_heatmap',
 'plot_return_quantiles',
 'plot_rolling_beta',
 'plot_rolling_fama_french',
 'plot_rolling_returns',
 'plot_rolling_sharpe',
 'plot_sector_allocations',
 'plot_slippage_sensitivity',
 'plot_slippage_sweep',
 'plot_turnover',
 'show_and_plot_top_positions']



In [18]:

    
pf.timeseries.sharpe_ratio(stock_rets)









    Out[18]:





1.0745672736176346

These functions have many more options and detailed descriptions



In [19]:

    
help(pf.plotting.plot_rolling_returns)









    



Help on function plot_rolling_returns in module pyfolio.plotting:

plot_rolling_returns(returns, factor_returns=None, live_start_date=None, cone_std=None, legend_loc='best', volatility_match=False, ax=None, **kwargs)
    Plots cumulative rolling returns versus some benchmarks'.
    
    Backtest returns are in green, and out-of-sample (live trading)
    returns are in red.
    
    Additionally, a linear cone plot may be added to the out-of-sample
    returns region.
    
    Parameters
    ----------
    returns : pd.Series
        Daily returns of the strategy, noncumulative.
         - See full explanation in tears.create_full_tear_sheet.
    factor_returns : pd.Series, optional
        Daily noncumulative returns of a risk factor.
         - This is in the same style as returns.
    live_start_date : datetime, optional
        The point in time when the strategy began live trading, after
        its backtest period.
    cone_std : float, or tuple, optional
        If float, The standard deviation to use for the cone plots.
        If tuple, Tuple of standard deviation values to use for the cone plots
         - The cone is a normal distribution with this standard deviation
             centered around a linear regression.
    legend_loc : matplotlib.loc, optional
        The location of the legend on the plot.
    volatility_match : bool, optional
        Whether to normalize the volatility of the returns to those of the
        benchmark returns. This helps compare strategies with different
        volatilities. Requires passing of benchmark_rets.
    ax : matplotlib.Axes, optional
        Axes upon which to plot.
    **kwargs, optional
        Passed to plotting function.
    
    Returns
    -------
    ax : matplotlib.Axes
        The axes that were plotted on.

Using pyfolio in Quantopian Research

Go to: https://www.quantopian.com/research/notebooks/Tutorial%20-%20pyfolio.ipynb

Bayesian analysis in pyfolio

Sneak-peek into ongoing research.
Focus is on comparing backtest (in-sample) and forward-test (out-of-sample; OOS).
Sophisticated statistical modeling taking uncertainty into account.
Uses T-distribution to model returns (instead of normal).
Relies on PyMC3.



In [21]:

    
oos_date = '2009-10-21'
pf.create_bayesian_tear_sheet(returns, live_start_date=oos_date)









    



Running T model
 [-----------------100%-----------------] 2000 of 2000 complete in 5.9 sec
Finished T model (required 32.18 seconds).

Running BEST model
 [-----------------100%-----------------] 2000 of 2000 complete in 6.8 sec
Finished BEST model (required 47.93 seconds).

Finished plotting Bayesian cone (required 0.13 seconds).

Finished plotting BEST results (required 0.84 seconds).

Finished computing Bayesian predictions (required 0.17 seconds).

Finished plotting Bayesian VaRs estimate (required 0.07 seconds).

Running alpha beta model
 [-----------------100%-----------------] 2000 of 2000 complete in 3.8 sec
Finished running alpha beta model (required 27.80 seconds).

Finished plotting alpha beta model (required 0.16 seconds).

Total runtime was 109.28 seconds.






    



/home/wiecki/miniconda3/lib/python3.4/site-packages/matplotlib/axes/_axes.py:475: UserWarning: No labelled objects found. Use label='...' kwarg on individual plots.
  warnings.warn("No labelled objects found. "

For more information:

Accompanying blog post: http://blog.quantopian.com/bayesian-cone/
Bayesian Methods for Hackers: http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/
PyMC3: http://pymc-devs.github.io/pymc3

Summary

Pyfolio bundles various useful analyses and includes advanced statistical modeling.
Jess Stauth's webinar: https://www.youtube.com/watch?v=-VmZAlBWUko
Still young -- please contribute: https://github.com/quantopian/pyfolio/labels/help%20wanted
Bugs: https://github.com/quantopian/pyfolio/issues
Twitter: @twiecki

	AMD	CERN	COST	DELL	GPS	INTC	MMM	cash
index
2009-12-24 00:00:00+00:00	0.00	0.00000	28116.291654	0.00000	86022.143403	40522.266140	17288.084400	261.648675
2009-12-28 00:00:00+00:00	0.00	0.00000	26444.895165	0.00000	91778.156400	38402.193192	17039.652084	-46.731967
2009-12-29 00:00:00+00:00	0.00	0.00000	18547.461768	0.00000	108639.828380	47874.339058	0.000000	-153.847015
2009-12-30 00:00:00+00:00	3178.44	0.00000	31364.795616	63294.65464	53188.785870	26018.584921	0.000000	-1705.100847
2009-12-31 00:00:00+00:00	44363.44	20816.10101	63315.443592	18114.05960	26346.752400	0.000000	0.000000	-944.218331

	txn_volume	txn_shares
2009-12-24 00:00:00+00:00	29064.589481	1381
2009-12-28 00:00:00+00:00	8794.514054	405
2009-12-29 00:00:00+00:00	49996.978324	1765
2009-12-30 00:00:00+00:00	157158.874704	9233
2009-12-31 00:00:00+00:00	189609.809608	11946