by Samuel Ching, Maxwell Margenot, Gus Gordon, and Delaney Mackenzie

Part of the Quantopian Lecture Series:

In professional quant workflows, it is critical to demonstrate the efficacy of any portfolio through rigorous testing. This is fundamental to understanding the risk profile as well as the performance of the portfolio. As such, quants and developers often have to build in-house tools to measure these metrics. To this end, we have created a package called pyfolio. pyfolio is a Python library for performance and risk analysis of financial portfolios, available on github here. It allows us to easily generate tear sheets to analyze the risk and performance of trading algorithms as well as return streams in general.

It is often tempting to run many backtests while building an algorithm. A common pitfall is to use the success of backtests as a feedback metric to fine-tune an algorithm's parameters or features while still in the construction phase. This leads to the overfitting of the strategy to whichever time periods the user ran the backtests on. Ultimately, this results in poor performance when deployed out of sample in live trading.

As such, running backtests and generating tearsheets should only occur at the tail end of the algorithm creation lifecycle. We then get a picture of the algorithm's performance, aiding the user in deciding whether to move forward with the deployment of the algorithm or to switch to another strategy.

There are two main parts to a full pyfolio tearsheet. First, there are the performance statistics in table format. Useful metrics such as the annual return, market beta, and Sharpe ratio are all listed in this table. These metrics not only represent how well the strategy has performed during the time period of the backtest (annual rate of return), they also show the risk-adjusted return as measured by the different ratios. We will go into more detail about the meaning of these metrics.

Next, there are plots which help to visualize a variety of the performance metrics. For instance, the user can use the drawdown plots to quickly pinpoint the time periods in which the strategy performed the worst. In addition, it will help the user to see if the strategy is performing as it should - if a strategy is market neutral, but suffers significant drawdowns during crisis periods, then there are clearly issues with the strategy's design or implementation.

```
In [1]:
```import pyfolio as pf
import matplotlib.pyplot as plt
import empyrical

```
In [2]:
```# Get benchmark returns
benchmark_rets = pf.utils.get_symbol_rets('SPY')

```
In [3]:
```# Get the backtest
bt = get_backtest('58812b2977ca4c474bbf393f')

```
```

`backtest`

object attributes.

```
In [4]:
```bt_returns = bt.daily_performance['returns']
bt_positions = bt.pyfolio_positions
bt_transactions = bt.pyfolio_transactions

With pyfolio, there is a wealth of performance statistics which most professional fund managers would use to analyze the performance of the algorithm. These metrics range from the algorithm's annual and monthly returns, return quantiles, rolling beta and sharpe ratios to the turnover of the portfolio. The most critical metrics are discussed as follows.

The risk-adjusted return is an essential metric of any strategy. Risk-adjusted returns allow us to judge returns streams that have different individual volatilities by providing an avenue for meaningful comparison. There are different measures of risk-adjusted returns but one of the most popular is the Sharpe ratio. In this particular backtest, the annual return of $2\%$ for $1\%$ volatility is an example of a relatively low absolute return, but a relatively high risk-adjusted return. Then, with a low risk strategy, leverage can then be applied to increase the absolute return.

```
In [5]:
```print "The Sharpe Ratio of the backtest is: ", empyrical.sharpe_ratio(bt_returns)

```
```

The market beta of an algorithm is the exposure of that stategy to the broader market. For instance, a market beta of $1$ would mean that you're buying the the market, while a beta of $-1$ means that you are shorting the market. Any beta within this range signifies reduced market influence, while any beta outside this range signifies increased market influence.

```
In [6]:
```print "The market beta of the backtest is: ", empyrical.beta(bt_returns,benchmark_rets)

```
```

*market neutral*. To institutional investors, market neutral strategies are very attractive. After all, if the investors want a strategy which is highly exposed to the market, they could simply buy an ETF or an index fund.

A drawdown is the 'peak to trough decline' of an investment strategy. Intuitively speaking, it refers to the losses the strategy has experienced from the base amount of capital which it had at the peak. For instance, in the 2008 Financial Crisis, the market drawdown was over 50% from the peak in 2007 to the trough in 2009.

```
In [7]:
```print "The maxmimum drawdown of the backtest is: ", empyrical.max_drawdown(bt_returns)

```
```

In pyfolio, there is a `plotting`

module which allows users to quickly plot these metrics. These plots can be individually plotted using the following functions:

`plot_annual_returns`

`plot_daily_returns_similarity`

`plot_daily_volume`

`plot_drawdown_periods`

`plot_drawdown_underwater`

`plot_exposures`

`plot_gross_leverage`

`plot_holdings`

`plot_long_short_holdings`

`plot_monthly_returns_dist`

`plot_monthly_returns_heatmap`

`plot_multistrike_cones`

`plot_prob_profit_trade`

`plot_return_quantiles`

`plot_rolling_beta`

`plot_rolling_returns`

`plot_rolling_sharpe`

`plot_turnover`

`plot_txn_time_hist`

`show_and_plot_top_positions`

Plots of cumulative returns and daily, non-cumulative returns allow you to gain a quick overview of the algorithm's performance and pick out any anomalies across the time period of the backtest. The cumulative return plot also allows you to make a comparison against benchmark returns - this could be against another investment strategy or an index like the S&P 500.

```
In [8]:
```# Cumulative Returns
plt.subplot(2,1,1)
pf.plotting.plot_rolling_returns(bt_returns, benchmark_rets)
# Daily, Non-Cumulative Returns
plt.subplot(2,1,2)
pf.plotting.plot_returns(bt_returns)
plt.tight_layout()

```
```

```
In [9]:
```fig = plt.figure(1)
plt.subplot(1,3,1)
pf.plot_annual_returns(bt_returns)
plt.subplot(1,3,2)
pf.plot_monthly_returns_dist(bt_returns)
plt.subplot(1,3,3)
pf.plot_monthly_returns_heatmap(bt_returns)
plt.tight_layout()
fig.set_size_inches(15,5)

```
```

```
In [10]:
```pf.plot_return_quantiles(bt_returns);

```
```

The center line in the middle of each box shows the median return, and the box shows the first quartile (25th percentile) as well as the 3rd quartile (75th percentile). While a high median return is always helpful, it is also important to understand the returns distribution. A tight box means that the bulk of the returns (25th - 75th percentile) fall within a tight bound - i.e. the returns are consistent and not volatile. A larger box means that the returns are more spread out. It is important, however, to take note of the scale to the left to put the quartiles in perspective. In addition, returns over longer periods of time will have a wider distribution as increasing the length of time increases the variability in returns.

The 'whiskers' at the end indicate the returns which fall outside the 25th and 75th percentile. A tight box with long whiskers indicate that there may be outliers in the returns - which may not be ideal if the outliers are negative. This may indicate that your strategy may be susceptible to certain market conditions / time periods.

Below, we have several rolling plots which show how an estimate changes throughout backtest period. In the case of the rolling beta and the rolling Sharpe ratio, the rolling estimate gives us more information than single point estimate for the entire period. A rolling estimate allows the user to see if the risk-adjusted return of the algorithm (Sharpe ratio) is consistent over time or if it fluctuates significantly. A volatile Sharpe ratio may indicate that the strategy may be riskier at certain time points or that it does not perform as well at these time points. Likewise, a volatile rolling beta indicates that it is exposed to the market during certain time points - if the strategy is meant to be market neutral, this could be a red flag.

```
In [11]:
```pf.plot_rolling_beta(bt_returns, benchmark_rets);

```
```

```
In [12]:
```pf.plot_rolling_sharpe(bt_returns);

```
```

In this plot, we see how exposed the strategy is to the 3 classical Fama-French factors. A factor model can be used to analyze the sources of risks and returns in a strategy or of any return stream. By looking at a strategy's historical returns, we can determine how much of these returns can be attributed to speculation on different factors and how much is a result of asset-specific fluctuations. This allows you to find out the sources of risk the portfolio is exposed to. For more information about Factor Models, check out the Factor Risk Exposure lecture.

```
In [13]:
```pf.plot_rolling_fama_french(bt_returns);

```
```

These classical risk factors measure for small market cap, high-growth, and momentum stocks. The SMB curve represents small-cap stocks minus big-cap stocks, HML curve represents high-growth minus low-growth stocks, and the UMD curve checks exposure to any momentum strategy (i.e. stocks which are trending up perform better than stocks which are trending down). The idea behind these risk factors is that even though they may provide higher returns, they are able to do so because they are riskier. Therefore, low measures of these in your strategy may indicate that your strategy is less risky.

Similar to the beta exposure to the market, a high exposure to a fama french factor ( $\geq 1$) means that you are simply buying these known risk factors. If an algorithm's return is made up of *known* risk factors, such as the Fama-French ones, then the strategy is not as valuable in generating alpha.

```
In [14]:
```pf.plot_drawdown_periods(bt_returns);

```
```

```
In [15]:
```pf.plot_drawdown_underwater(bt_returns);

```
```

```
In [16]:
```pf.plot_gross_leverage(bt_returns, bt_positions);

```
```

Monitoring the leverage of a strategy is important as it affects how you trade on margin. Unlike discretionary strategies where you could actively increase or decrease the leverage used in going long or short, algorithmic strategies automatically apply leverage during trading. Therefore, it is useful to monitor the gross leverage plot to ensure that the amount of leverage that your strategy uses is within the limits that you are comfortable with.

Good strategies generally start with an initial leverage of 1. Upon finding out the viability of the strategy by examining the Sharpe ratio and other metrics, leverage can be increased or decreased accordingly. A lower Sharpe ratio indicates that the strategy has a higher volatility per unit return, making it more risky to lever up. On the other hand, a higher Sharpe ratio indicates lower volatility per unit return, allowing you to increase the leverage and correspondingly, returns.

For more details, take a look at the lecture on leverage.

The tables below list the top 10 long and short positions of all time. The goal of each algorithm is to minimize the proportion of the portfolio invested in each security at any time point. This prevents the movement of any individual security from having a significant impact on the portfolio as a whole. The bigger the exposure a strategy has to any security, the greater the risk.

Generally, the biggest failure point for many strategies is high portfolio concentration in a few securities. While this may produce significant positive returns over a given time period, the converse can easily occur. Huge swings in a small number of equities would result in significant drawdowns. Good strategies tend to be those in which no security comprises more than 10% of the portfolio.

```
In [17]:
```pos_percent = pf.pos.get_percent_alloc(bt_positions)
pf.plotting.show_and_plot_top_positions(bt_returns, pos_percent);

```
```

The holdings per day allows us to gain an insight into whether the total portfolio holdings fluctuate from day to day. This plot provides a good sanity check as to whether the algorithm is performing as it should, or if there were any bugs which should be fixed. For instance, we can use to holdings plot to check if the trading behavior is expected, i.e. if there are extended periods in which the number of holdings is exceptionally low or if that the algorithm is not trading.

```
In [18]:
```pf.plot_holdings(bt_returns, bt_positions);

```
```

This plot reflects how many shares are traded as a fraction of total shares. The higher the daily turnover, the higher the transaction costs associated with the algorithm. However, this also means that the returns and risk metrics are better able to capture the underlying performance of the algorithm as the higher quantity of trades provides more samples (of returns, risk, etc.) to draw from. This would in turn give a better estimation on *Out of Sample* periods as well.

```
In [19]:
```pf.plot_turnover(bt_returns, bt_transactions, bt_positions);

```
```

```
In [20]:
```pf.plotting.plot_daily_turnover_hist(bt_transactions, bt_positions);

```
```

```
In [21]:
```pf.plotting.plot_daily_volume(bt_returns, bt_transactions);

```
```

**when** the algorithm makes its trades during each day. You can specify the size of the bin (each column's width) as well as the timezone in the function's parameters.

```
In [22]:
```pf.plotting.plot_txn_time_hist(bt_transactions);

```
```

When evaluating the performance of an investment strategy, it is helpful to quantify the frequency, duration, and profitability of its independent bets, or "round trip" trades. A round trip trade is when a new long or short position is opened and later completely or partially closed out.

The intent of the round trip tearsheet is to differentiate strategies that profited off of a few lucky trades from strategies that profited repeatedly off of genuine alpha. Breaking down round trip profitability by traded name and sector can also inform universe selection and identify exposure risks. For example, even if your equity curve looks robust, if only two securities in your universe of fifteen names contributed to overall profitability, you may have reason to question the logic of your strategy.

To identify round trips, pyfolio reconstructs the complete portfolio based on the transactions that you pass in. When you make a trade, pyfolio checks if shares are already present in your portfolio purchased at a certain price. If there are, we compute the Profit and Loss (P&L), returns and duration of that round trip. In calculating round trips, pyfolio also appends position-closing transactions at the last timestamp in the positions data. This closing transaction will cause the P&L from any open positions to realized as completed round trips.

Before the round trip plots, there is a table of summary statistics which provide useful information about the strategy. For instance, the `Percent profitable`

statistic shows the percentage of all trades which are profitable. This allows us to calculate the probability of the strategy making a profitable decision. This probability is also reflected in the round trip plots. A quick check of this plot tells us if our strategy is performing better than chance. In addition, the `PnL stats`

also break down our average net profit for each trade and allow us to see how much of a role our short side trades play versus our long side trades in contributing to our total profit. These statistics give you a quick overview of the profitability of the strategy.

**Note**: These plots are not included by default in the `create_full_tear_sheet()`

function. In order to plot the round trip plots, you have pass in `round_trips=True`

as a parameter to the function.

`pf.create_round_trip_tear_sheet()`

. Passing in a sector map is optional.

```
In [23]:
```pf.create_round_trip_tear_sheet(bt_returns, bt_positions, bt_transactions);

```
```