In this example, we'll look at valuing a simple portfolio composed of equities and futures. In additional, we'll calculate an intraday P&L, and some simple exposure measures.
The main challenge of valuing and risk-managing a multi-asset portfolio is that each asset type must be treated differently. Position, market and static data for each asset type must typically be sourced from separate systems and then integrated into a single holistic view. Loman excels at this sort of task, making it easy to add new nodes as new sources of data become available, or as new calculations are required.
To simplify exposition, we have deliberately limited the number of asset types we treat, and the number of measures we calculate. However, it should be easy to see that the techniques applied here can be extended naturally to more complex portfolios.
To make sure that this example is as widely accessible as possible, we have used publicly available data sources. For actual investing use, we strongly recommend investigating paid services to access higher quality data (and which typically requires less processing to use).
This example makes heavy use of Pandas dataframes. If you haven't come across Pandas before, definitely take a look: http://pandas.pydata.org/. Dataframes are an in-memory table object, and have gained popularity for many data-processing tasks.
Ok, let's get going! We start, as usual, by creating a Loman Computation object, which we shall populate with data and calculations:
In [1]:
import loman
comp = loman.Computation()
The first thing we shall need is holdings data. For this example, we assume that holdings data is provided in a CSV format, and insert that CSV data into a node in the computation called holdings using the add_node
method.
The portfolio itself consist of 5 equities from an arbitrary CNN Money article, together with a short S&P E-mini future to approximately hedge overall market exposure. (Our choice of article should not be construed as a recommendation for or against the content of that article, or the site as a whole).
In [2]:
holdings = """Type,Symbol,Qty,CostBasis
Equity,AVGO,126,22680
Equity,EVHC,349,22685
Equity,STT,287,22673
Equity,DAL,454,22700
Equity,DY,283,22640
Future,ESM7,-1,0
Cash,USD,2000,
"""
comp.add_node('holdings', value=holdings)
We need to convert this CSV text data into a usable form.
The @loman.node
decorator allows us to insert a new node into the computation, using the name of the function as the name of the node. Any parameters of the function that are named the same as nodes in the computation will take the values of those nodes when the function is called.
Here we use the @loman.node
decorator to create a new node, called df_holdings, which will be calculated from the node holdings that we defined above. The function simply reads the CSV data into a Pandas dataframe.
In [3]:
from io import StringIO
import pandas as pd
@loman.node(comp)
def df_holdings(holdings):
f = StringIO(holdings)
df = pd.read_csv(f)
return df
We can easily see the structure of a computation, although at this stage it is pretty simple. There is a node holdings, in dark green because we set its value, and so it is up-to-date. And there is a second node df_holdings, which depends on holdings, as shown by the arrow. The lighter green color tells us the df_holdings is computable, but not currently up-to-date, which we expect, as we have not told Loman to calculate it yet.
In [4]:
comp
Out[4]:
So we can go ahead and calculate df_holdings and check that it looks correct. Computation objects have an attribute-based accessor, v
, which allows us to see the value of any node. It also works great with the auto-complete in interactive environments, such as IPython Notebook.
In [5]:
comp.compute('df_holdings')
comp.v.df_holdings
Out[5]:
A quick sidenote: It is the nature of these examples that I present the computations in their complete state, with no mention of the struggle that it took to get them there. That's a shame, because one of the really great things about working with Loman is that you can iterate on a single node - even one in the middle of a calculation - without having to recalculate any of the upstream or downstream nodes if you don't want to. In this case, it doesn't make a huge amount of difference, but when dealing with large data sets and time-consuming calculations, it can be a boon not to have to wait a couple of minutes to see the results of your latest update to an intermediate calculation.
Back to the main plot. We will need to source different market data for each type of asset that we hold, and perform different calculations to boot. To this end, we split df_holdings into three separate DataFrames, one for each asset type, in nodes df_holdings_equity, df_holdings_future and df_holdings_cash.
We use a single function for calculating each of the nodes, and provide different values to the type
parameter. As before, the Loman knows that the df_holdings argument will be populated from the df_holdings node. We use the kwds
parameter to add_node
to tell Loman that the parameter type
should be a constant, rather than taken from a node. Thus the df_holdings_equity node will be calculated using the df_holdings node, and the constant 'Equity'
, and so on.
In [6]:
def filter_df(df_holdings, type):
return df_holdings[df_holdings.Type == type]
comp.add_node('df_holdings_equity', filter_df, kwds={'type': loman.C('Equity')})
comp.add_node('df_holdings_future', filter_df, kwds={'type': loman.C('Future')})
comp.add_node('df_holdings_cash', filter_df, kwds={'type': loman.C('Cash')})
As before, we can compute nodes and inspect the results to make sure they are as we expect:
In [7]:
comp.compute_all()
comp.v.df_holdings_equity
Out[7]:
In [8]:
comp.v.df_holdings_future
Out[8]:
In [9]:
comp.v.df_holdings_cash
Out[9]:
To do a portfolio valuation, and calculate exposure metrics, we will need market data. The market data we need will be different for each type of asset. We start first with the equity portfolio.
We add node df_equity_yql_data_yahoo, and we populate it by applying the function get_yahoo_equity_data
to each Symbol in df_holdings_equity, the DataFrame with our equity holdings, in turn. The yahoo_finance library returns a dictionary, for each symbol, so we turn each of those into a row, and df_equity_yql_data_yahoo will contain a DataFrame.
In [10]:
import yahoo_finance
def get_yahoo_equity_data(symbol):
s = yahoo_finance.Share(symbol)
return pd.Series(s.data_set)
@loman.node(comp)
def df_equity_yql_data_yahoo(df_holdings_equity):
return df_holdings_equity.Symbol.apply(get_yahoo_equity_data)
As before, we can compute and inspect the node to see what it contains. Yahoo returns a large number of fields, and we capture them all. For now, we will only use a small number of those fields, but there is no extra cost to capture them, and they may be useful for other calculations that we would want to add in the future - for example, we may want to calculate sets of alerts as prices fall below certain moving averages, say.
In [11]:
comp.compute_all()
comp.v.df_equity_yql_data_yahoo
Out[11]:
We would also like betas for the equities in our portfolio. Unfortunately, Yahoo doesn't expose these through its YQL API, so we instead scrape the data directly from the page. Again, we'd like to strongly recommend using paid data sources to avoid this sort of hack. Nonetheless, this does demonstrate the power of the libraries available for Python, as well as showcasing Loman's ability to easily integrate multiple data sources.
We use the same strategy of iterating over each Symbol in df_holdings_equity, this time applying the function scrape_yahoo_data
to get a web page and read beta from it.
In [12]:
import requests, bs4
def scrape_yahoo_data(symbol):
resp = requests.get('https://finance.yahoo.com/quote/{}?ltr=1'.format(symbol))
soup = bs4.BeautifulSoup(resp.content, 'lxml')
el = soup.find(attrs={'data-test': 'BETA-value'})
beta = float(el.text)
return pd.Series({'Symbol': symbol, 'Beta': beta})
@loman.node(comp)
def df_equity_scraped_data_yahoo(df_holdings_equity):
return df_holdings_equity.Symbol.apply(scrape_yahoo_data)
And again, we quickly compute and inspect the result.
In [13]:
comp.compute_all()
comp.v.df_equity_scraped_data_yahoo
Out[13]:
The parsing of futures prices from scrapes of the CME's website is slightly more onerous, requiring the future code to be parsed from an attribute using a regex. To move the exposition along, we won't cover it in detail, but just show that the DataFrame of prices we obtain is sensible:
In [14]:
import numpy as np
import re
from dateutil import relativedelta
id_regex = re.compile('quotesFuturesProductTable1_(?P<Symbol>.{4})_(?P<Field>.*)')
def get_el_data(el):
m = id_regex.match(el['id'])
d = m.groupdict()
d['Value'] = el.text
return d
def try_float(x):
try:
return float(x)
except:
return np.nan
@loman.node(comp)
def df_future_prices():
resp = requests.get('http://www.cmegroup.com/trading/equity-index/us-index/e-mini-sandp500.html')
soup = bs4.BeautifulSoup(resp.content, 'lxml')
els = soup.findAll('td', attrs={'id': id_regex})
df = pd.DataFrame([get_el_data(el) for el in els])
df = df.set_index(['Symbol', 'Field']).Value.unstack()
for col in ['change', 'high', 'last', 'low', 'open', 'volume']:
df[col] = df[col].apply(try_float)
return df
In [15]:
comp.compute_all()
comp.v.df_future_prices
Out[15]:
We also need some static data for futures. In this case we need two things: First, the contract unit i.e. the dollar amount of each point move in the index. And secondly, the beta, which in this case (glossing over technicalities) is 1, since the underlying of the future is the index itself. In a real system, this data would be stored in a database or security master system, which we could of course access from Loman. To keep this example self-contained, here we hardcode our static data, and insert it into a node df_future_static_data.
In [16]:
data = [
['ESM7', 50, 'USD', 1.],
['ESU7', 50, 'USD', 1.],
['ESZ7', 50, 'USD', 1.],
['ESH8', 50, 'USD', 1.],
['ESM8', 50, 'USD', 1.],
]
df = pd.DataFrame(data, columns=['Symbol', 'UnitAmount', 'UnitCurrency', 'Beta'])
comp.add_node('df_future_static_data', value=df)
Finally, we come onto the interesting bit - using our holdings data and market data to calculate P&Ls and exposure measures.
We start with the equity portion of the portfolio. We'll add a new node df_equity_pnl, which will contain a DataFrame with a row for each holding we have, together with the market and other information that we want. This node will depend on the three relevant nodes df_holdings_equity (containing holdings data), df_equity_yql_data_yahoo (containing prices) and df_equity_scraped_data_yahoo (containing betas), and so these are the parameters of the function defining the node df_equity_pnl.
In the first part of the computation, we use Pandas merge functionality to join the tables, based on looking up Symbol in each table. We also take the opportunity to limit the set of columns taken from df_equity_yql_data_yahoo, and to rename columns to our liking.
In the second part of the computation, we calculate several measures. For example, intraday PNL is current value less value at the close. And we define an exposure metric called CurrentBetaAdjExposure, which is defined as Beta $\times$ Current Exposure (Current Exposure is just Current Value for equities).
As always, we check the results as we go.
In [17]:
@loman.node(comp)
def df_equity_pnl(df_holdings_equity, df_equity_yql_data_yahoo, df_equity_scraped_data_yahoo):
# Merge DataFrames
df = pd.merge(df_holdings_equity, df_equity_yql_data_yahoo[['Symbol', 'PreviousClose', 'LastTradePriceOnly']],
how='left', on='Symbol')
df.rename(columns={'PreviousClose': 'Close', 'LastTradePriceOnly': 'Last'}, inplace=True)
for col in ['Close', 'Last']:
df[col] = df[col].astype(float)
df = pd.merge(df, df_equity_scraped_data_yahoo, how='left', on='Symbol')
# Calculate Measures
df['CloseValue'] = df.Qty * df.Close
df['CurrentValue'] = df.Qty * df.Last
df['PNL_Day'] = df.CurrentValue - df.CloseValue
df['PNL_ITD'] = df.CurrentValue - df.CostBasis
df['CurrentExposure'] = df.CurrentValue
df['CurrentBetaAdjExposure'] = df.Beta * df.CurrentExposure
return df
comp.compute_all()
comp.v.df_equity_pnl
Out[17]:
The process is very similar for futures. df_future_pnl is created by merging the holdings, price and static data in df_holdings_future, df_future_prices and df_future_static_data respectively. We again calculate each of the P&L and exposure measures that we would like to see, but note that the treatment for futures is different. For example, intraday P&L is calculated as $(\text{Last Price} - \text{Close Price}) \times \text{Quantity} \times \text{Contract Unit}$.
In [18]:
@loman.node(comp)
def df_future_pnl(df_holdings_future, df_future_prices, df_future_static_data):
# Merge DataFrames
df = pd.merge(df_holdings_future, df_future_prices[['priorSettle', 'last']], how='left', left_on='Symbol', right_index=True)
df.rename(columns={'priorSettle': 'Close', 'last': 'Last'}, inplace=True)
for col in ['Close', 'Last']:
df[col] = df[col].astype(float)
df = pd.merge(df, df_future_static_data, how='left', on='Symbol')
# Calculate Measures
df['PNL_Day'] = (df.Last - df.Close) * df.Qty * df.UnitAmount
df['CloseValue'] = 0
df['CurrentValue'] = df.PNL_Day
df['PNL_ITD'] = df.CurrentValue - df.CostBasis
df['CurrentExposure'] = df.Last * df.Qty * df.UnitAmount
df['CurrentBetaAdjExposure'] = df.Beta * df.CurrentExposure
return df
comp.compute_all()
comp.v.df_future_pnl
Out[18]:
The calculations for the cash component are more basic, as cash has no exposure by these measures:
In [19]:
@loman.node(comp)
def df_cash_pnl(df_holdings_cash):
df = df_holdings_cash.copy()
df['Close'] = np.nan
df['Last'] = np.nan
df['PNL_Day'] = df['PNL_ITD'] = 0.
df['CloseValue'] = df['CurrentValue'] = df['CostBasis'] = df.Qty
df['CurrentExposure'] = df['CurrentBetaAdjExposure'] = 0.
df['Beta'] = 0
return df
comp.compute_all()
comp.v.df_cash_pnl
Out[19]:
Finally, we add a node df_pnl, where we take all the rows from the equity, future and cash P&L and exposure DataFrames in df_equity_pnl, df_future_pnl and df_cash_pnl respectively. This gives us a position-by-position P&L report.
In [20]:
@loman.node(comp)
def df_pnl(df_equity_pnl, df_future_pnl, df_cash_pnl):
df = pd.concat([df_equity_pnl, df_future_pnl, df_cash_pnl])
df = df[['Type', 'Symbol', 'Qty', 'Close', 'Last', 'CostBasis', 'CloseValue', 'CurrentValue', 'PNL_ITD', 'PNL_Day', 'Beta', 'CurrentExposure', 'CurrentBetaAdjExposure']]
return df
comp.compute_all()
comp.v.df_pnl
Out[20]:
From there, various aggregations are possible. For example, we can easily see intraday P&L, and inception-to-date P&L:
In [21]:
comp.v.df_pnl.PNL_Day.sum()
Out[21]:
In [22]:
comp.v.df_pnl.PNL_ITD.sum()
Out[22]:
As well as seeing our exposure, and beta-adjusted exposure metrics across the whole portfolio.
In [23]:
comp.v.df_pnl[['CurrentExposure', 'CurrentBetaAdjExposure']].sum()
Out[23]:
We could even put the aggregated data itself into new nodes.
In [24]:
comp.draw(graph_attr={'size': '"8,6"'})
Out[24]:
This is serves a practical purpose. Code typically spends the bulk of its life being maintained, so it is important to make that maintenance as easy and fast as possible. Returning to this after several months, it takes a time to reacquaint with even a modest amount of code, while the logical structure and visualization capabilities offered by Loman make it easy to see how a computation occurs.
There are also benefits when we are first creating code. Because Loman keeps track of state for us, as we interactively add nodes (and possibily take several iterations to get them correct), we do not have to re-run upstream dependencies. Contrast this with a writing a program by adding sub-routines or sections of code. Each re-run would necessitate fetching data from several external websites, and by the end, even in this small case, the process of trying out new iterations becomes quite onerous.
Of course there are other possibilities. An interactive environment such as IPython Notebook requires us to manually keep track ourselves of what needs to be re-run each time. Or a program that serializes data from external websites to disk and then reads in the cached copy also imposes a substantial cognitive load keeping track of that cache.
Our solution is extensible. It is easy to add new asset types, new measures, or new classifications to cover more complex portfolios, and create more detailed reports. Also, we can combine with other data sources to create entirely new reports - for example, it is easy to imagine using earnings announcement calendars to create a node with a list of equities reporting earnings in the next week, which could be used to provide alerts. Or we could progarm a node to show how near our portfolio is to risk or compliance limits.
For real-time use, Loman's ability to track state allows us to update different inputs at different cadences. For example, we might update the equity and price nodes every tick, or every few seconds, but update semi-static data less frequently.
We can also control output cadence by specifying which nodes to calculate, when. We only produce one set of results here, but consider if we had a valuation or risk model that we wanted to use, but which was slow to calculate. We could still calculate P&L at a relatively high frequency, which calculating the more time-consuming model every few minutes, say.
Finally, our solution is potentially usable in a variety of different contexts. We might have a batch system using this computation graph, to produce daily, or periodic intraday valuations centrally, while the investment team managing a particular portfolio might appreciate the ability to run the same computation graph on their portfolios and model portfolios whenever they desire.
That brings us to the end of this example. We looked at how to use Loman to create a basic valuation system for a limited portfolio. In doing so, we explored several tangible benefits that we hope make life easier for quant practitioners. If anything is unclear in this example, feel free to contact the authors through the Loman Forum on Google Groups.