So you're planning your retirement, and you've already done the difficult part: assessing your capacity for risk, selecting an appropriate asset allocation, and selected a reasonably tax-efficient placement of funds. If you haven't done this yet, I suggest reading The Four Pillars of Investing and poking around on the Bogleheads forums for a while. The next step (following the principles of asset allocation across multiple accounts), will be to minimize the blended expense ratio across all accounts. You could do this yourself by hand, but fortunately, this is easily expressible as a linear programming problem, which computers can optimize quickly. Here I will walk through a basic Python implementation of minimizing the blended expense ratio given some simple choices made for an asset allocation.

We'll begin with some boilerplate, importing the libraries and instantiating our linear programming problem to be solved.


In [1]:
from pulp import *
from lxml import etree
import requests

prob = LpProblem("Assets", LpMinimize)

Here's our basic asset allocation: 80% stocks, 20% bonds. 10% of the stock allocation is in REITs, the rest is split 60%/40% between US and International, respectively. For bonds, we'll have an 80/20 US/International split. This is not necessarily the best, or even a good, asset allocation, but it's chosen for numerical convenience and to set up multiple interesting asset classes. The allocation can be expanded or reduced depending on your actual allocation. Rather than specifying the asset-classes for each fund, we'll just map Morningstar categories to the asset classes we're using. If we wanted a more complex breakdown, we'd need to parse a different Morningstar page containing the blend information and regional breakdowns, but this is left as an exercise for the reader. Note that they don't necessarily have to sum to 1. Given the breakdown we've chosen, we'll compute the actual dollar values we want for each asset class.


In [2]:
allocation = { 'stock_us'   : 0.8 * 0.9 * 0.6,
               'stock_intl' : 0.8 * 0.9 * 0.4,
               'reit'       : 0.8 * 0.1,
               'bond_us'    : 0.2 * 0.8,
               'bond_intl'  : 0.2 * 0.2 }

In [3]:
category = { 'Mid-Cap Blend'           : { 'stock_us'   : 1.0 },
             'Foreign Large Blend'     : { 'stock_intl' : 1.0 },
             'Real Estate'             : { 'reit'       : 1.0 },
             'Large Blend'             : { 'stock_us'   : 1.0 },
             'Intermediate-Term Bond'  : { 'bond_us'    : 1.0 },
             'Foreign Small/Mid Blend' : { 'stock_intl' : 1.0 },
             'Small Value'             : { 'stock_us'   : 1.0 },
             'World Bond'              : { 'bond_intl'  : 1.0 } }

For our purposes, the account balances are fixed, and we want to be able to optimize the share placement within these three accounts. These are my fund choices and initial share placement. For the 401(k), I have a fixed set of options, which are fortunately very good. Due to tax considerations, in my personal account I'll only hold equities. For the IRA, I've allowed REITs and stocks because, at a high-level, I can see that Bonds will exist entirely within the 401(k). We could just as easily add a bond fund to the IRA, it would just end up at 0.


In [4]:
assets = {'401k'     : { 'VEMPX' : 1000,
                         'VTPSX' : 1000,
                         'VGSNX' : 1000,
                         'VIIIX' : 1000,
                         'VBMPX' : 1000 },
          'ira'      : { 'VTIAX' : 1000,
                         'VTSAX' : 1000,
                         'VGSLX' : 1000,
                         'VTABX' : 1000 },
          'personal' : { 'VTSAX' : 1000,
                         'VFWAX' : 1000 } }

In [5]:
funds = {}
for accounts in assets.keys():
    for fund in assets[accounts]:
        funds[fund] = {}

Next, we'll want to look up current prices, expense ratios, and cateogories from Morningstar. I've chosen Morningstar mostly for convenience, as they label the fields with a vkey attribute which makes for some convenient XPath expressions.


In [6]:
for fund in funds.keys():
    params = { 't' : 'XNAS:' + fund, 'region' : 'usa', 'culture' : 'en-US', 'cur' : 'USD' }
    r = requests.get('http://quotes.morningstar.com/fund/c-header', params=params)
    tree = etree.fromstring(r.text, etree.HTMLParser())
    funds[fund]['price'] = float(tree.xpath(
             "//span[@vkey='NAV']/text()")[0].strip())
    funds[fund]['er'] = float(tree.xpath(
             "//span[@vkey='ExpenseRatio']/text()")[0].strip().rstrip('%'))
    composition = category[tree.xpath(
             "//span[@vkey='MorningstarCategory']/text()")[0].strip()]
    funds[fund]['composition'] = composition
    print fund, '@', funds[fund]['price'], funds[fund]['er']


VTSAX @ 49.95 0.05
VFWAX @ 32.67 0.15
VIIIX @ 182.31 0.02
VTPSX @ 117.05 0.1
VGSNX @ 16.81 0.08
VEMPX @ 161.88 0.06
VBMPX @ 10.83 0.05
VTABX @ 20.62 0.2
VTIAX @ 29.26 0.14
VGSLX @ 108.64 0.1

Next, I'll sum the total value of each account and the global value. Because money is not transferable across account boundaries without incurring painful tax implications, I'll treat these as fixed as well..


In [7]:
account_value = {}
for account in assets.keys():
    account_value[account] = sum([ shares * funds[fund]['price']
                                  for (fund, shares)
                                  in assets[account].items()])
    print '%s value: %0.2f' % (account, account_value[account])
v_total = sum(account_value.values())


401k value: 488880.00
ira value: 208470.00
personal value: 82620.00

In [8]:
ideal_value = {}
for asset_class in allocation.keys():
    ideal_value[asset_class] = allocation[asset_class] * v_total

We've set up the basic data of interest, so now it's time to get into the meat of the problem. We begin by creating our variables for the linear programming problem, representing the number of shares we want to hold of each fund within each account. We'll represent these variables of the form 'account:fund' (e.g. '401k:VBMPX')


In [9]:
shares = {}

for account in assets.keys():
    shares[account] = {}
    for fund in assets[account]:
        shares[account][fund] = LpVariable(account + ':' + fund, 0, None)

We need to state what we're going to minimize, so we begin by calculating the expenses associated with each account, the product of price, shares, and expense ratio.


In [10]:
prob += (sum([funds[x]['er'] * funds[x]['price'] * shares['401k'][x] for x in            
              shares['401k'].keys()]) +
         sum([funds[x]['er'] * funds[x]['price'] * shares['ira'][x] for x in
              shares['ira'].keys()]) +
         sum([funds[x]['er'] * funds[x]['price'] * shares['personal'][x] for x in
              shares['personal'].keys()]))

Next, we want to create some constraints on the solutions. First, our account balances should be fixed, so for each account, the sum of the product of price and shares is a fixed value.


In [11]:
prob += (sum([funds[x]['price'] * shares['401k'][x] for x in
              shares['401k'].keys()]) == account_value['401k'])
prob += (sum([funds[x]['price'] * shares['ira'][x] for x in
              shares['ira'].keys()]) == account_value['ira'])
prob += (sum([funds[x]['price'] * shares['personal'][x] for x in
              shares['personal'].keys()]) == account_value['personal'])

Now, here's an example of constraint between funds. My 401(k) offers VIIIX and VEMPX which are the S&P 500 and S&P Completion Index, but if I hold them I want to approximate the total US market. The appropriate ratio to achieve the total US market is 0.82:0.18 by value.


In [12]:
prob += (0.18 * funds['VIIIX']['price'] * shares['401k']['VIIIX'] -
         0.82 * funds['VEMPX']['price'] * shares['401k']['VEMPX'] == 0)

We'll create constraints for each of the asset classes we have to ensure that the total holdings across all accounts match the correct value. To help us out, we'll start by making a simple function that will select all of a given asset class in a certain account, then use that function to generate the constraints.


In [13]:
def asset_class_allocation_constraints(account, asset_class):
    return sum([funds[fund]['composition'][asset_class] *
                funds[fund]['price'] *
                shares[account][fund] for fund in shares[account].keys()
                if asset_class in funds[fund]['composition']])

In [14]:
for asset_class in allocation.keys():
    prob += (sum([ 
                  asset_class_allocation_constraints(account, asset_class)
                  for account in assets.keys() ]) ==
             ideal_value[asset_class])

One last optional bit, I prefer to hold more than the Vanguard Admiral minima, so I set a constraint at 110% of the threshold.


In [15]:
prob += (funds['VTSAX']['price'] * shares['ira']['VTSAX'] >= 11000)
prob += (funds['VGSLX']['price'] * shares['ira']['VGSLX'] >= 11000)
prob += (funds['VFWAX']['price'] * shares['personal']['VFWAX'] >= 11000)

All of the constraints are set up, so we're ready to optimize our expense ratio. We'll print out the share holdings for the optimal fund allocation, and the final overall expense ratio.


In [16]:
prob.solve()

print "Status: ", LpStatus[prob.status]

for v in prob.variables():
    print "%s: %f ($%0.2f)" % (v.name, v.varValue, funds[v.name.split(':')[1]]['price'] * v.varValue)

print "Blended expense ratio", value(prob.objective)/v_total


Status:  Infeasible
401k:VBMPX: 11523.084000 ($124795.00)
401k:VEMPX: 167.293950 ($27081.54)
401k:VGSNX: 0.000000 ($0.00)
401k:VIIIX: 676.714690 ($123371.86)
401k:VTPSX: 1825.126000 ($213631.00)
ira:VGSLX: 574.351990 ($62397.60)
ira:VTABX: 1513.035900 ($31198.80)
ira:VTIAX: 0.000000 ($0.00)
ira:VTSAX: 2299.771800 ($114873.60)
personal:VFWAX: 336.700340 ($11000.00)
personal:VTSAX: 1433.833800 ($71620.00)
Blended expense ratio 0.0707070524115