TradingEnv-v0

Open AI 'Gym' for reinforcement-learning based trading algorithms

This gym implements a very simple trading environment for reinforcement learning.

The gym provides daily observations based on real market data pulled from Quandl on, by default, the SPY etf. An episode is defined as 252 contiguous days sampled from the overall dataset. Each day is one 'step' within the gym and for each step, the algo has a choice:

  • SHORT (0)
  • FLAT (1)
  • LONG (2)

If you trade, you will be charged, by default, 10 BPS of the size of your trade. Thus, going from short to long costs twice as much as going from short to/from flat. Not trading also has a default cost of 1 BPS per step. Nobody said it would be easy!

At the beginning of your episode, you are allocated 1 unit of cash. This is your starting Net Asset Value (NAV).

Beating the trading game

For our purposes, we'll say that beating a buy & hold strategy, on average, over one hundred episodes will notch a win to the proud ai player. We'll illustrate exactly what that means below.

Let's look at some code using the environment

imports


In [1]:
import gym
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import interactive
interactive(True)

create the environment

This may take a moment as we are pulling historical data from quandl.


In [2]:
env = gym.make('trading-v0')
#env.time_cost_bps = 0 #


[2017-01-04 18:38:13,327] Making new env: trading-v0
[2017-01-04 18:38:13,336] gym.envs.classic_control.trading_env logger started.
[2017-01-04 18:38:13,337] getting data for GOOG/NYSE_SPY from quandl...
[2017-01-04 18:38:16,049] got data for GOOG/NYSE_SPY from quandl...

the trading model

Each time step is a day. Each episode is 252 trading days - a year. Each day, we can choose to be short (0), flat (1) or long (2) the single instrument in our trading universe.

Let's run through a day and stay flat.


In [3]:
observation = env.reset()
done = False
navs = []
while not done:
    action = 1 # stay flat
    observation, reward, done, info = env.step(action)
    navs.append(info['nav'])
    if done:
        print 'Annualized return: ',navs[len(navs)-1]-1
        pd.DataFrame(navs).plot()


Annualized return:  -0.0247888380589

Note that you are charged just for playing - to the tune of 1 basis point per day!

Rendering

For now, no rendering has been implemented for this gym, but with each step, the following datum are provided which you can easily graph and otherwise visualize as we see above with the NAV:

  • pnl - how much did we make or lose between yesterday and today?
  • costs - how much did we pay in costs today
  • nav - our current nav

utility methods: running strategies once or repeatedly

Although the gym can be 'exercised' directly as seen above, we've also written utility methods which allow for the running of a strategy once or over many episodes, facilitating training or other sorts of analysis.

To utilize these methods, strategies should be exposed as a function or lambda with the following signature:

Action a = strategy( observation, environment )

Below, we define some simple strategies and look briefly at their behavior to better understand the trading gym.


In [4]:
import trading_env as te

stayflat     = lambda o,e: 1   # stand pat
buyandhold   = lambda o,e: 2   # buy on day #1 and hold
randomtrader = lambda o,e: e.action_space.sample() # retail trader

# to run singly, we call run_strat.  we are returned a dataframe containing 
#  all steps in the sim.
bhdf = env.run_strat(buyandhold)

print bhdf.head()

# we can easily plot our nav in time:
bhdf.bod_nav.plot(title='buy & hold nav')


   action   bod_nav   mkt_nav  mkt_return  sim_return  position   costs  trade
0     2.0  1.000000  1.000000   -0.011808   -0.001100       1.0  0.0011    1.0
1     2.0  0.998900  0.988192   -0.004627   -0.004727       1.0  0.0001    0.0
2     2.0  0.994178  0.983619   -0.002354   -0.002454       1.0  0.0001    0.0
3     2.0  0.991738  0.981304   -0.002890   -0.002990       1.0  0.0001    0.0
4     2.0  0.988772  0.978467    0.003845    0.003745       1.0  0.0001    0.0
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1626563310>

running the same strategy multiple times will likely yield different results as underlying data changes


In [5]:
env.run_strat(buyandhold).bod_nav.plot(title='same strat, different results')
env.run_strat(buyandhold).bod_nav.plot()
env.run_strat(buyandhold).bod_nav.plot()


Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1626713650>

comparing the buyandhold and random traders


In [6]:
# running a strategy multiple times should yield insights 
#   into its expected behavior or give it oppty to learn
bhdf = env.run_strats(buyandhold,100)
rdf = env.run_strats(randomtrader,100)

comparo = pd.DataFrame({'buyhold':bhdf.mean(),
                        'random': rdf.mean()})
comparo


[2017-01-04 18:38:22,877] writing log to /tmp/tmpZDmiZ2
[2017-01-04 18:38:30,835] writing log to /tmp/tmpH7XAk2
Out[6]:
buyhold random
action 2.000000 0.997063
bod_nav 0.995850 0.886405
mkt_nav 1.009959 1.012461
mkt_return 0.000215 0.000173
sim_return 0.000110 -0.000989
position 1.000000 -0.002937
costs 0.000104 0.000987
trade 0.003968 0.000159

Object of the game

From the above examples, we can see that buying and holding will, over the long run, give you the market return with low costs.

Randomly trading will instead destroy value rather quickly as costs overwhelm.

So, what does it mean to win the trading game?

For our purposes, we'll say that beating a buy & hold strategy, on average, over one hundred episodes will notch a win to the proud ai player.

To support this, the trading environment maintains the mkt_return which can be compared with the sim_return.

Note that the mkt_return is frictionless while the sim_return incurs both trading costs and the decay cost of 1 basis point per day, so overcoming the hurdle we've set here should be challenging.

Playing the game: purloined policy gradients

I've taken and adapted (see code for details) a policy gradient implementation based on tensorflow to try to play the single-instrument trading game. Let's see how it does.


In [7]:
import tensorflow as tf
import policy_gradient


[2017-01-04 18:38:40,700] policy_gradient logger started.

In [8]:
# create the tf session
sess = tf.InteractiveSession()

# create policygradient
pg = policy_gradient.PolicyGradient(sess, obs_dim=5, num_actions=3, learning_rate=1e-2 )

# and now let's train it and evaluate its progress.  NB: this could take some time...
df,sf = pg.train_model( env,episodes=25001, log_freq=100)#, load_model=True)


[2017-01-04 18:38:42,890] year #     0, mean reward:  -0.0014, sim ret:  -0.1423, mkt ret:   0.2069, net:  -0.3492
[2017-01-04 18:39:24,015] year #   100, mean reward:  -0.0496, sim ret:   0.0993, mkt ret:  -0.1906, net:   0.2900
[2017-01-04 18:40:00,117] year #   200, mean reward:  -0.0583, sim ret:  -0.1106, mkt ret:   0.0868, net:  -0.1974
[2017-01-04 18:40:32,235] year #   300, mean reward:  -0.0039, sim ret:   0.1549, mkt ret:   0.1977, net:  -0.0427
[2017-01-04 18:41:03,065] year #   400, mean reward:   0.0216, sim ret:  -0.0935, mkt ret:  -0.0765, net:  -0.0171
[2017-01-04 18:41:33,705] year #   500, mean reward:   0.0280, sim ret:   0.1924, mkt ret:   0.2065, net:  -0.0140
[2017-01-04 18:42:05,916] year #   600, mean reward:   0.0228, sim ret:  -0.1764, mkt ret:  -0.1498, net:  -0.0267
[2017-01-04 18:42:36,725] year #   700, mean reward:   0.0204, sim ret:  -0.1844, mkt ret:  -0.1657, net:  -0.0187
[2017-01-04 18:43:07,595] year #   800, mean reward:   0.0227, sim ret:   0.1769, mkt ret:   0.2113, net:  -0.0345
[2017-01-04 18:43:38,511] year #   900, mean reward:   0.0278, sim ret:  -0.0158, mkt ret:   0.0217, net:  -0.0375
[2017-01-04 18:44:09,833] year #  1000, mean reward:   0.0336, sim ret:  -0.0186, mkt ret:   0.0106, net:  -0.0292
[2017-01-04 18:44:41,030] year #  1100, mean reward:   0.0176, sim ret:  -0.4554, mkt ret:  -0.4159, net:  -0.0395
[2017-01-04 18:45:13,080] year #  1200, mean reward:   0.0391, sim ret:   0.2577, mkt ret:   0.3129, net:  -0.0552
[2017-01-04 18:45:44,538] year #  1300, mean reward:   0.0330, sim ret:   0.0901, mkt ret:   0.0966, net:  -0.0065
[2017-01-04 18:46:15,954] year #  1400, mean reward:   0.0187, sim ret:  -0.0248, mkt ret:   0.1588, net:  -0.1836
[2017-01-04 18:46:47,241] year #  1500, mean reward:   0.0080, sim ret:  -0.0121, mkt ret:   0.0114, net:  -0.0235
[2017-01-04 18:47:19,123] year #  1600, mean reward:   0.0104, sim ret:   0.3213, mkt ret:   0.3635, net:  -0.0422
[2017-01-04 18:47:50,704] year #  1700, mean reward:   0.0219, sim ret:   0.0137, mkt ret:   0.0324, net:  -0.0187
[2017-01-04 18:48:22,082] year #  1800, mean reward:   0.0189, sim ret:   0.0686, mkt ret:   0.1008, net:  -0.0322
[2017-01-04 18:48:53,719] year #  1900, mean reward:   0.0177, sim ret:  -0.0068, mkt ret:   0.0408, net:  -0.0476
[2017-01-04 18:49:25,522] year #  2000, mean reward:   0.0261, sim ret:   0.0557, mkt ret:   0.1406, net:  -0.0849
[2017-01-04 18:49:57,353] year #  2100, mean reward:   0.0024, sim ret:   0.1470, mkt ret:   0.1751, net:  -0.0280
[2017-01-04 18:50:29,307] year #  2200, mean reward:   0.0212, sim ret:   0.1023, mkt ret:   0.1336, net:  -0.0313
[2017-01-04 18:51:01,190] year #  2300, mean reward:   0.0209, sim ret:  -0.0628, mkt ret:  -0.0341, net:  -0.0288
[2017-01-04 18:51:33,224] year #  2400, mean reward:   0.0179, sim ret:  -0.1410, mkt ret:  -0.1095, net:  -0.0314
[2017-01-04 18:52:05,555] year #  2500, mean reward:   0.0167, sim ret:   0.1409, mkt ret:   0.1547, net:  -0.0138
[2017-01-04 18:52:37,700] year #  2600, mean reward:   0.0233, sim ret:  -0.1693, mkt ret:  -0.1511, net:  -0.0181
[2017-01-04 18:53:10,125] year #  2700, mean reward:   0.0265, sim ret:  -0.0217, mkt ret:   0.0031, net:  -0.0248
[2017-01-04 18:53:42,708] year #  2800, mean reward:   0.0316, sim ret:   0.0659, mkt ret:   0.0963, net:  -0.0304
[2017-01-04 18:54:15,123] year #  2900, mean reward:   0.0402, sim ret:   0.2952, mkt ret:   0.3198, net:  -0.0247
[2017-01-04 18:54:47,606] year #  3000, mean reward:   0.0326, sim ret:  -0.0337, mkt ret:  -0.0028, net:  -0.0308
[2017-01-04 18:55:20,475] year #  3100, mean reward:   0.0274, sim ret:  -0.1920, mkt ret:  -0.1626, net:  -0.0294
[2017-01-04 18:55:53,102] year #  3200, mean reward:   0.0273, sim ret:   0.3358, mkt ret:   0.4451, net:  -0.1093
[2017-01-04 18:56:25,748] year #  3300, mean reward:   0.0390, sim ret:  -0.1185, mkt ret:  -0.0884, net:  -0.0301
[2017-01-04 18:56:59,508] year #  3400, mean reward:   0.0217, sim ret:   0.1176, mkt ret:   0.1614, net:  -0.0438
[2017-01-04 18:57:33,551] year #  3500, mean reward:   0.0208, sim ret:   0.1133, mkt ret:   0.1361, net:  -0.0228
[2017-01-04 18:58:07,625] year #  3600, mean reward:   0.0237, sim ret:  -0.1696, mkt ret:  -0.1554, net:  -0.0142
[2017-01-04 18:58:41,622] year #  3700, mean reward:   0.0255, sim ret:   0.0853, mkt ret:   0.1110, net:  -0.0257
[2017-01-04 18:59:15,811] year #  3800, mean reward:   0.0532, sim ret:   0.0460, mkt ret:   0.0753, net:  -0.0293
[2017-01-04 18:59:50,132] year #  3900, mean reward:   0.0317, sim ret:   0.0195, mkt ret:   0.0581, net:  -0.0385
[2017-01-04 19:00:25,373] year #  4000, mean reward:   0.0531, sim ret:   0.1510, mkt ret:   0.1830, net:  -0.0321
[2017-01-04 19:00:59,832] year #  4100, mean reward:   0.0402, sim ret:   0.0071, mkt ret:   0.0312, net:  -0.0240
[2017-01-04 19:01:34,659] year #  4200, mean reward:   0.0416, sim ret:   0.0066, mkt ret:   0.0528, net:  -0.0461
[2017-01-04 19:02:09,403] year #  4300, mean reward:   0.0427, sim ret:  -0.0008, mkt ret:   0.0044, net:  -0.0052
[2017-01-04 19:02:44,251] year #  4400, mean reward:   0.0169, sim ret:  -0.3836, mkt ret:  -0.3684, net:  -0.0152
[2017-01-04 19:03:19,237] year #  4500, mean reward:   0.0318, sim ret:  -0.1078, mkt ret:  -0.0879, net:  -0.0198
[2017-01-04 19:03:54,391] year #  4600, mean reward:   0.0327, sim ret:   0.1177, mkt ret:   0.1298, net:  -0.0121
[2017-01-04 19:04:29,610] year #  4700, mean reward:   0.0155, sim ret:  -0.2140, mkt ret:  -0.2001, net:  -0.0138
[2017-01-04 19:05:04,766] year #  4800, mean reward:   0.0202, sim ret:   0.0299, mkt ret:   0.0827, net:  -0.0528
[2017-01-04 19:05:40,327] year #  4900, mean reward:   0.0228, sim ret:   0.0891, mkt ret:   0.1131, net:  -0.0240
[2017-01-04 19:06:16,125] year #  5000, mean reward:   0.0313, sim ret:   0.4537, mkt ret:   0.4500, net:   0.0038
[2017-01-04 19:06:51,917] year #  5100, mean reward:   0.0065, sim ret:  -0.0334, mkt ret:  -0.0135, net:  -0.0199
[2017-01-04 19:07:27,693] year #  5200, mean reward:   0.0171, sim ret:  -0.2388, mkt ret:  -0.2185, net:  -0.0203
[2017-01-04 19:08:03,762] year #  5300, mean reward:   0.0348, sim ret:  -0.2228, mkt ret:  -0.2040, net:  -0.0187
[2017-01-04 19:08:39,880] year #  5400, mean reward:   0.0189, sim ret:  -0.4815, mkt ret:  -0.4753, net:  -0.0062
[2017-01-04 19:09:16,105] year #  5500, mean reward:   0.0265, sim ret:   0.0207, mkt ret:   0.0684, net:  -0.0477
[2017-01-04 19:09:52,425] year #  5600, mean reward:   0.0189, sim ret:   0.0987, mkt ret:   0.1261, net:  -0.0274
[2017-01-04 19:10:29,301] year #  5700, mean reward:   0.0168, sim ret:   0.0716, mkt ret:   0.0906, net:  -0.0189
[2017-01-04 19:11:05,765] year #  5800, mean reward:   0.0109, sim ret:   0.2570, mkt ret:   0.0397, net:   0.2173
[2017-01-04 19:11:42,388] year #  5900, mean reward:   0.0207, sim ret:   0.0216, mkt ret:   0.1320, net:  -0.1104
[2017-01-04 19:12:18,877] year #  6000, mean reward:   0.0217, sim ret:  -0.1750, mkt ret:  -0.3850, net:   0.2101
[2017-01-04 19:12:55,854] year #  6100, mean reward:   0.0243, sim ret:  -0.0769, mkt ret:  -0.3378, net:   0.2608
[2017-01-04 19:13:32,872] year #  6200, mean reward:   0.0158, sim ret:  -0.0140, mkt ret:  -0.0409, net:   0.0269
[2017-01-04 19:14:09,913] year #  6300, mean reward:   0.0213, sim ret:  -0.0615, mkt ret:   0.1333, net:  -0.1948
[2017-01-04 19:14:46,935] year #  6400, mean reward:   0.0215, sim ret:   0.1142, mkt ret:   0.1600, net:  -0.0457
[2017-01-04 19:15:25,110] year #  6500, mean reward:   0.0238, sim ret:   0.1413, mkt ret:   0.1120, net:   0.0293
[2017-01-04 19:16:02,441] year #  6600, mean reward:   0.0440, sim ret:   0.1504, mkt ret:   0.1820, net:  -0.0316
[2017-01-04 19:16:40,124] year #  6700, mean reward:   0.0464, sim ret:   0.0445, mkt ret:   0.0624, net:  -0.0180
[2017-01-04 19:17:17,911] year #  6800, mean reward:   0.0386, sim ret:  -0.4160, mkt ret:  -0.3908, net:  -0.0251
[2017-01-04 19:17:55,812] year #  6900, mean reward:   0.0408, sim ret:  -0.1465, mkt ret:  -0.0121, net:  -0.1344
[2017-01-04 19:18:33,493] year #  7000, mean reward:   0.0333, sim ret:   0.0706, mkt ret:   0.0853, net:  -0.0147
[2017-01-04 19:19:11,662] year #  7100, mean reward:   0.0278, sim ret:   0.0645, mkt ret:   0.0931, net:  -0.0286
[2017-01-04 19:19:49,676] year #  7200, mean reward:   0.0202, sim ret:   0.0974, mkt ret:   0.1169, net:  -0.0195
[2017-01-04 19:20:27,757] year #  7300, mean reward:   0.0288, sim ret:  -0.1560, mkt ret:  -0.1057, net:  -0.0504
[2017-01-04 19:21:06,048] year #  7400, mean reward:   0.0355, sim ret:   0.0255, mkt ret:   0.0529, net:  -0.0274
[2017-01-04 19:21:44,533] year #  7500, mean reward:   0.0309, sim ret:   0.1371, mkt ret:  -0.1105, net:   0.2476
[2017-01-04 19:22:23,011] year #  7600, mean reward:   0.0111, sim ret:  -0.0037, mkt ret:   0.0312, net:  -0.0349
[2017-01-04 19:23:01,259] year #  7700, mean reward:   0.0098, sim ret:   0.2571, mkt ret:   0.3316, net:  -0.0746
[2017-01-04 19:23:40,027] year #  7800, mean reward:   0.0250, sim ret:   0.0101, mkt ret:   0.0379, net:  -0.0278
[2017-01-04 19:24:18,549] year #  7900, mean reward:   0.0296, sim ret:   0.0649, mkt ret:   0.0311, net:   0.0338
[2017-01-04 19:24:57,520] year #  8000, mean reward:   0.0423, sim ret:   0.0434, mkt ret:   0.1647, net:  -0.1213
[2017-01-04 19:25:36,745] year #  8100, mean reward:   0.0344, sim ret:  -0.0339, mkt ret:   0.0326, net:  -0.0665
[2017-01-04 19:26:15,869] year #  8200, mean reward:   0.0484, sim ret:   0.0393, mkt ret:   0.2542, net:  -0.2149
[2017-01-04 19:26:54,824] year #  8300, mean reward:   0.0578, sim ret:   0.1697, mkt ret:   0.1877, net:  -0.0179
[2017-01-04 19:27:34,005] year #  8400, mean reward:   0.0409, sim ret:   0.1537, mkt ret:   0.1954, net:  -0.0417
[2017-01-04 19:28:13,314] year #  8500, mean reward:   0.0405, sim ret:  -0.0635, mkt ret:  -0.2173, net:   0.1538
[2017-01-04 19:28:52,578] year #  8600, mean reward:   0.0432, sim ret:   0.0535, mkt ret:   0.0959, net:  -0.0424
[2017-01-04 19:29:31,985] year #  8700, mean reward:   0.0331, sim ret:  -0.0117, mkt ret:   0.1279, net:  -0.1396
[2017-01-04 19:30:11,636] year #  8800, mean reward:   0.0345, sim ret:  -0.2345, mkt ret:  -0.2042, net:  -0.0303
[2017-01-04 19:30:51,730] year #  8900, mean reward:   0.0082, sim ret:   0.0326, mkt ret:   0.0491, net:  -0.0165
[2017-01-04 19:31:31,584] year #  9000, mean reward:   0.0338, sim ret:  -0.0678, mkt ret:  -0.0395, net:  -0.0284
[2017-01-04 19:32:11,608] year #  9100, mean reward:   0.0357, sim ret:   0.1347, mkt ret:   0.1740, net:  -0.0393
[2017-01-04 19:32:51,849] year #  9200, mean reward:   0.0471, sim ret:  -0.3771, mkt ret:  -0.3699, net:  -0.0073
[2017-01-04 19:33:32,166] year #  9300, mean reward:   0.0267, sim ret:  -0.0426, mkt ret:   0.0230, net:  -0.0656
[2017-01-04 19:34:12,236] year #  9400, mean reward:   0.0243, sim ret:   0.0734, mkt ret:   0.1154, net:  -0.0420
[2017-01-04 19:34:52,433] year #  9500, mean reward:   0.0372, sim ret:  -0.0451, mkt ret:  -0.0161, net:  -0.0290
[2017-01-04 19:35:33,058] year #  9600, mean reward:   0.0557, sim ret:  -0.0195, mkt ret:   0.0215, net:  -0.0410
[2017-01-04 19:36:13,957] year #  9700, mean reward:   0.0511, sim ret:   0.1976, mkt ret:   0.2210, net:  -0.0234
[2017-01-04 19:36:54,464] year #  9800, mean reward:   0.0242, sim ret:  -0.2499, mkt ret:  -0.2314, net:  -0.0185
[2017-01-04 19:37:35,588] year #  9900, mean reward:   0.0323, sim ret:  -0.2221, mkt ret:  -0.1900, net:  -0.0321
[2017-01-04 19:38:16,867] year # 10000, mean reward:   0.0286, sim ret:   0.1059, mkt ret:   0.1234, net:  -0.0175
[2017-01-04 19:38:57,535] year # 10100, mean reward:   0.0368, sim ret:   0.1403, mkt ret:   0.1541, net:  -0.0138
[2017-01-04 19:39:38,856] year # 10200, mean reward:   0.0222, sim ret:   0.0345, mkt ret:   0.0763, net:  -0.0418
[2017-01-04 19:40:20,401] year # 10300, mean reward:   0.0489, sim ret:   0.3831, mkt ret:   0.3198, net:   0.0633
[2017-01-04 19:41:01,580] year # 10400, mean reward:   0.0274, sim ret:   0.1003, mkt ret:   0.1081, net:  -0.0078
[2017-01-04 19:41:43,146] year # 10500, mean reward:   0.0532, sim ret:   0.0021, mkt ret:   0.0252, net:  -0.0231
[2017-01-04 19:42:24,940] year # 10600, mean reward:   0.0394, sim ret:  -0.1884, mkt ret:  -0.1813, net:  -0.0070
[2017-01-04 19:43:06,708] year # 10700, mean reward:   0.0427, sim ret:  -0.0127, mkt ret:   0.0174, net:  -0.0301
[2017-01-04 19:43:48,379] year # 10800, mean reward:   0.0381, sim ret:   0.1549, mkt ret:   0.1977, net:  -0.0427
[2017-01-04 19:44:30,342] year # 10900, mean reward:   0.0318, sim ret:   0.1292, mkt ret:   0.1530, net:  -0.0238
[2017-01-04 19:45:12,388] year # 11000, mean reward:   0.0322, sim ret:   0.0064, mkt ret:   0.0496, net:  -0.0432
[2017-01-04 19:45:54,849] year # 11100, mean reward:   0.0160, sim ret:   0.0845, mkt ret:   0.1241, net:  -0.0396
[2017-01-04 19:46:36,942] year # 11200, mean reward:   0.0155, sim ret:   0.0038, mkt ret:   0.0384, net:  -0.0347
[2017-01-04 19:47:19,214] year # 11300, mean reward:   0.0166, sim ret:  -0.1119, mkt ret:  -0.1982, net:   0.0862
[2017-01-04 19:48:01,707] year # 11400, mean reward:   0.0056, sim ret:   0.0283, mkt ret:   0.0126, net:   0.0157
[2017-01-04 19:48:44,519] year # 11500, mean reward:   0.0312, sim ret:   0.0603, mkt ret:   0.0999, net:  -0.0396
[2017-01-04 19:49:27,290] year # 11600, mean reward:   0.0078, sim ret:  -0.1797, mkt ret:  -0.1575, net:  -0.0222
[2017-01-04 19:50:10,311] year # 11700, mean reward:   0.0129, sim ret:   0.0173, mkt ret:   0.0473, net:  -0.0299
[2017-01-04 19:50:53,321] year # 11800, mean reward:   0.0412, sim ret:   0.0607, mkt ret:   0.0922, net:  -0.0315
[2017-01-04 19:51:36,528] year # 11900, mean reward:   0.0494, sim ret:  -0.0033, mkt ret:   0.0059, net:  -0.0092
[2017-01-04 19:52:19,816] year # 12000, mean reward:   0.0389, sim ret:  -0.0106, mkt ret:   0.0138, net:  -0.0244
[2017-01-04 19:53:03,617] year # 12100, mean reward:   0.0356, sim ret:  -0.1403, mkt ret:  -0.1086, net:  -0.0317
[2017-01-04 19:53:48,292] year # 12200, mean reward:   0.0405, sim ret:   0.0306, mkt ret:   0.0501, net:  -0.0195
[2017-01-04 19:54:37,376] year # 12300, mean reward:   0.0562, sim ret:   0.0805, mkt ret:   0.1107, net:  -0.0302
[2017-01-04 19:55:37,580] year # 12400, mean reward:   0.0442, sim ret:   0.1737, mkt ret:   0.1925, net:  -0.0188
[2017-01-04 19:56:24,450] year # 12500, mean reward:   0.0531, sim ret:   0.0182, mkt ret:   0.0439, net:  -0.0257
[2017-01-04 19:57:07,836] year # 12600, mean reward:   0.0390, sim ret:   0.1751, mkt ret:   0.2084, net:  -0.0333
[2017-01-04 19:57:51,425] year # 12700, mean reward:   0.0275, sim ret:  -0.2386, mkt ret:  -0.2187, net:  -0.0198
[2017-01-04 19:58:35,480] year # 12800, mean reward:   0.0252, sim ret:   0.1572, mkt ret:   0.1887, net:  -0.0316
[2017-01-04 19:59:19,389] year # 12900, mean reward:   0.0168, sim ret:   0.0285, mkt ret:   0.0617, net:  -0.0333
[2017-01-04 20:00:03,664] year # 13000, mean reward:   0.0173, sim ret:  -0.1507, mkt ret:  -0.1239, net:  -0.0268
[2017-01-04 20:00:48,080] year # 13100, mean reward:   0.0178, sim ret:   0.0373, mkt ret:   0.1009, net:  -0.0636
[2017-01-04 20:01:32,258] year # 13200, mean reward:   0.0181, sim ret:   0.0789, mkt ret:   0.1074, net:  -0.0285
[2017-01-04 20:02:16,897] year # 13300, mean reward:   0.0390, sim ret:   0.1763, mkt ret:   0.1920, net:  -0.0157
[2017-01-04 20:03:01,466] year # 13400, mean reward:   0.0233, sim ret:   0.0188, mkt ret:   0.0623, net:  -0.0435
[2017-01-04 20:03:46,347] year # 13500, mean reward:   0.0200, sim ret:  -0.3897, mkt ret:  -0.3730, net:  -0.0167
[2017-01-04 20:04:31,347] year # 13600, mean reward:   0.0185, sim ret:   0.0749, mkt ret:   0.1145, net:  -0.0396
[2017-01-04 20:05:16,027] year # 13700, mean reward:   0.0285, sim ret:   0.0532, mkt ret:   0.0733, net:  -0.0201
[2017-01-04 20:06:00,655] year # 13800, mean reward:   0.0203, sim ret:   0.0573, mkt ret:   0.0408, net:   0.0165
[2017-01-04 20:06:45,757] year # 13900, mean reward:   0.0328, sim ret:   0.0306, mkt ret:   0.0461, net:  -0.0155
[2017-01-04 20:07:30,914] year # 14000, mean reward:  -0.0035, sim ret:  -0.0876, mkt ret:  -0.0642, net:  -0.0234
[2017-01-04 20:08:16,307] year # 14100, mean reward:   0.0054, sim ret:   0.1786, mkt ret:   0.2028, net:  -0.0242
[2017-01-04 20:09:01,755] year # 14200, mean reward:   0.0144, sim ret:   0.0172, mkt ret:   0.0425, net:  -0.0254
[2017-01-04 20:09:47,199] year # 14300, mean reward:   0.0238, sim ret:  -0.0164, mkt ret:   0.0055, net:  -0.0220
[2017-01-04 20:10:32,821] year # 14400, mean reward:   0.0261, sim ret:   0.1941, mkt ret:   0.2565, net:  -0.0624
[2017-01-04 20:11:18,237] year # 14500, mean reward:   0.0248, sim ret:   0.1435, mkt ret:   0.1636, net:  -0.0201
[2017-01-04 20:12:04,028] year # 14600, mean reward:   0.0194, sim ret:   0.1786, mkt ret:   0.2028, net:  -0.0242
[2017-01-04 20:12:49,979] year # 14700, mean reward:   0.0189, sim ret:  -0.1033, mkt ret:  -0.0771, net:  -0.0262
[2017-01-04 20:13:35,911] year # 14800, mean reward:   0.0193, sim ret:   0.0914, mkt ret:   0.1190, net:  -0.0277
[2017-01-04 20:14:22,181] year # 14900, mean reward:   0.0312, sim ret:   0.0691, mkt ret:   0.0968, net:  -0.0277
[2017-01-04 20:15:08,120] year # 15000, mean reward:   0.0430, sim ret:  -0.0045, mkt ret:   0.0166, net:  -0.0211
[2017-01-04 20:15:54,430] year # 15100, mean reward:   0.0290, sim ret:  -0.1809, mkt ret:  -0.1719, net:  -0.0091
[2017-01-04 20:16:40,641] year # 15200, mean reward:   0.0401, sim ret:   0.0207, mkt ret:   0.0684, net:  -0.0477
[2017-01-04 20:17:27,469] year # 15300, mean reward:   0.0197, sim ret:   0.1698, mkt ret:   0.2025, net:  -0.0327
[2017-01-04 20:18:13,949] year # 15400, mean reward:   0.0152, sim ret:  -0.1433, mkt ret:  -0.1230, net:  -0.0203
[2017-01-04 20:19:00,609] year # 15500, mean reward:   0.0426, sim ret:   0.3919, mkt ret:   0.3953, net:  -0.0034
[2017-01-04 20:19:47,536] year # 15600, mean reward:   0.0442, sim ret:   0.0353, mkt ret:   0.0629, net:  -0.0275
[2017-01-04 20:20:34,406] year # 15700, mean reward:   0.0272, sim ret:   0.1823, mkt ret:   0.2207, net:  -0.0384
[2017-01-04 20:21:21,556] year # 15800, mean reward:   0.0208, sim ret:   0.1019, mkt ret:   0.1289, net:  -0.0271
[2017-01-04 20:22:08,674] year # 15900, mean reward:   0.0094, sim ret:   0.0721, mkt ret:   0.0982, net:  -0.0261
[2017-01-04 20:22:55,774] year # 16000, mean reward:   0.0361, sim ret:  -0.0621, mkt ret:  -0.0250, net:  -0.0370
[2017-01-04 20:23:43,409] year # 16100, mean reward:   0.0352, sim ret:   0.0180, mkt ret:   0.0458, net:  -0.0277
[2017-01-04 20:24:30,661] year # 16200, mean reward:   0.0443, sim ret:  -0.0128, mkt ret:   0.0105, net:  -0.0232
[2017-01-04 20:25:18,164] year # 16300, mean reward:   0.0455, sim ret:  -0.1260, mkt ret:  -0.1187, net:  -0.0073
[2017-01-04 20:26:05,970] year # 16400, mean reward:   0.0444, sim ret:  -0.0909, mkt ret:  -0.0689, net:  -0.0221
[2017-01-04 20:26:53,663] year # 16500, mean reward:   0.0289, sim ret:   0.0185, mkt ret:   0.0484, net:  -0.0299
[2017-01-04 20:27:41,399] year # 16600, mean reward:   0.0489, sim ret:   0.0123, mkt ret:   0.0348, net:  -0.0225
[2017-01-04 20:28:30,193] year # 16700, mean reward:   0.0498, sim ret:   0.1901, mkt ret:   0.2226, net:  -0.0324
[2017-01-04 20:29:19,884] year # 16800, mean reward:   0.0201, sim ret:  -0.1387, mkt ret:  -0.1339, net:  -0.0048
[2017-01-04 20:30:10,007] year # 16900, mean reward:   0.0188, sim ret:   0.1452, mkt ret:   0.1696, net:  -0.0244
[2017-01-04 20:31:00,346] year # 17000, mean reward:   0.0311, sim ret:   0.0853, mkt ret:   0.1844, net:  -0.0990
[2017-01-04 20:31:50,019] year # 17100, mean reward:   0.0249, sim ret:   0.0484, mkt ret:   0.0639, net:  -0.0155
[2017-01-04 20:32:39,388] year # 17200, mean reward:   0.0204, sim ret:  -0.0096, mkt ret:   0.0901, net:  -0.0997
[2017-01-04 20:33:29,852] year # 17300, mean reward:   0.0105, sim ret:   0.0601, mkt ret:   0.0837, net:  -0.0236
[2017-01-04 20:34:20,302] year # 17400, mean reward:   0.0161, sim ret:   0.1111, mkt ret:   0.1627, net:  -0.0516
[2017-01-04 20:35:10,903] year # 17500, mean reward:   0.0279, sim ret:   0.1548, mkt ret:   0.1965, net:  -0.0417
[2017-01-04 20:36:01,451] year # 17600, mean reward:   0.0268, sim ret:   0.0548, mkt ret:   0.0694, net:  -0.0146
[2017-01-04 20:36:51,913] year # 17700, mean reward:   0.0376, sim ret:   0.2018, mkt ret:   0.2487, net:  -0.0469
[2017-01-04 20:37:42,734] year # 17800, mean reward:   0.0169, sim ret:   0.0555, mkt ret:   0.1445, net:  -0.0891
[2017-01-04 20:38:33,654] year # 17900, mean reward:   0.0187, sim ret:  -0.3343, mkt ret:  -0.4248, net:   0.0904
[2017-01-04 20:39:25,260] year # 18000, mean reward:  -0.0040, sim ret:  -0.0649, mkt ret:   0.1400, net:  -0.2049
[2017-01-04 20:40:16,835] year # 18100, mean reward:   0.0304, sim ret:   0.1287, mkt ret:   0.1719, net:  -0.0432
[2017-01-04 20:41:08,345] year # 18200, mean reward:   0.0206, sim ret:   0.0073, mkt ret:   0.0971, net:  -0.0899
[2017-01-04 20:42:00,130] year # 18300, mean reward:   0.0177, sim ret:   0.0756, mkt ret:   0.1238, net:  -0.0482
[2017-01-04 20:42:51,938] year # 18400, mean reward:  -0.0039, sim ret:   0.1786, mkt ret:  -0.2047, net:   0.3833
[2017-01-04 20:43:43,549] year # 18500, mean reward:   0.0048, sim ret:  -0.2996, mkt ret:  -0.2804, net:  -0.0192
[2017-01-04 20:44:36,098] year # 18600, mean reward:   0.0117, sim ret:   0.0700, mkt ret:   0.1898, net:  -0.1198
[2017-01-04 20:45:28,320] year # 18700, mean reward:   0.0122, sim ret:   0.1472, mkt ret:   0.2082, net:  -0.0609
[2017-01-04 20:46:20,570] year # 18800, mean reward:   0.0014, sim ret:   0.1384, mkt ret:   0.2131, net:  -0.0747
[2017-01-04 20:47:13,167] year # 18900, mean reward:   0.0030, sim ret:   0.0945, mkt ret:   0.0500, net:   0.0445
[2017-01-04 20:48:05,532] year # 19000, mean reward:   0.0021, sim ret:  -0.2712, mkt ret:  -0.2294, net:  -0.0418
[2017-01-04 20:48:57,700] year # 19100, mean reward:   0.0040, sim ret:   0.0553, mkt ret:   0.0998, net:  -0.0445
[2017-01-04 20:49:49,608] year # 19200, mean reward:   0.0269, sim ret:   0.1664, mkt ret:   0.1987, net:  -0.0323
[2017-01-04 20:50:41,805] year # 19300, mean reward:   0.0211, sim ret:  -0.2273, mkt ret:  -0.2146, net:  -0.0127
[2017-01-04 20:51:34,825] year # 19400, mean reward:   0.0305, sim ret:  -0.0342, mkt ret:   0.0321, net:  -0.0663
[2017-01-04 20:52:27,550] year # 19500, mean reward:   0.0380, sim ret:   0.1770, mkt ret:   0.2046, net:  -0.0277
[2017-01-04 20:53:21,002] year # 19600, mean reward:   0.0284, sim ret:  -0.2069, mkt ret:  -0.1775, net:  -0.0294
[2017-01-04 20:54:12,639] year # 19700, mean reward:   0.0209, sim ret:   0.1873, mkt ret:   0.1924, net:  -0.0051
[2017-01-04 20:55:12,014] year # 19800, mean reward:   0.0081, sim ret:   0.1034, mkt ret:   0.1340, net:  -0.0306
[2017-01-04 20:56:15,991] year # 19900, mean reward:   0.0087, sim ret:  -0.0520, mkt ret:  -0.0168, net:  -0.0352
[2017-01-04 20:57:18,487] year # 20000, mean reward:   0.0372, sim ret:   0.3376, mkt ret:   0.3931, net:  -0.0555
[2017-01-04 20:58:20,133] year # 20100, mean reward:   0.0234, sim ret:   0.1031, mkt ret:   0.1265, net:  -0.0233
[2017-01-04 20:59:21,972] year # 20200, mean reward:   0.0212, sim ret:   0.1047, mkt ret:   0.1338, net:  -0.0291
[2017-01-04 21:00:24,310] year # 20300, mean reward:   0.0133, sim ret:   0.0803, mkt ret:   0.1132, net:  -0.0329
[2017-01-04 21:01:29,130] year # 20400, mean reward:   0.0148, sim ret:  -0.0562, mkt ret:  -0.0241, net:  -0.0321
[2017-01-04 21:02:29,951] year # 20500, mean reward:   0.0028, sim ret:   0.0464, mkt ret:   0.0044, net:   0.0419
[2017-01-04 21:03:36,035] year # 20600, mean reward:   0.0096, sim ret:   0.0849, mkt ret:   0.1100, net:  -0.0250
[2017-01-04 21:04:40,466] year # 20700, mean reward:   0.0231, sim ret:   0.3389, mkt ret:   0.3542, net:  -0.0154
[2017-01-04 21:05:47,024] year # 20800, mean reward:   0.0262, sim ret:   0.1329, mkt ret:   0.1719, net:  -0.0390
[2017-01-04 21:06:52,908] year # 20900, mean reward:   0.0246, sim ret:   0.1606, mkt ret:   0.1793, net:  -0.0187
[2017-01-04 21:07:56,873] year # 21000, mean reward:   0.0254, sim ret:   0.5144, mkt ret:   0.4349, net:   0.0796
[2017-01-04 21:09:02,984] year # 21100, mean reward:   0.0379, sim ret:   0.1224, mkt ret:   0.1451, net:  -0.0228
[2017-01-04 21:10:02,503] year # 21200, mean reward:   0.0377, sim ret:   0.0984, mkt ret:   0.2038, net:  -0.1055
[2017-01-04 21:11:04,317] year # 21300, mean reward:   0.0309, sim ret:  -0.0656, mkt ret:  -0.3503, net:   0.2847
[2017-01-04 21:12:07,841] year # 21400, mean reward:   0.0225, sim ret:  -0.0748, mkt ret:   0.0601, net:  -0.1348
[2017-01-04 21:13:12,992] year # 21500, mean reward:   0.0288, sim ret:   0.1502, mkt ret:   0.1843, net:  -0.0340
[2017-01-04 21:14:20,167] year # 21600, mean reward:   0.0326, sim ret:   0.1087, mkt ret:   0.1133, net:  -0.0046
[2017-01-04 21:15:27,390] year # 21700, mean reward:   0.0235, sim ret:  -0.0921, mkt ret:  -0.2245, net:   0.1324
[2017-01-04 21:16:33,016] year # 21800, mean reward:   0.0307, sim ret:   0.0293, mkt ret:   0.1157, net:  -0.0864
[2017-01-04 21:17:35,766] year # 21900, mean reward:   0.0361, sim ret:  -0.1333, mkt ret:  -0.2545, net:   0.1212
[2017-01-04 21:18:42,299] year # 22000, mean reward:   0.0368, sim ret:  -0.1477, mkt ret:  -0.1292, net:  -0.0185
[2017-01-04 21:19:45,125] year # 22100, mean reward:   0.0291, sim ret:   0.1572, mkt ret:   0.2405, net:  -0.0833
[2017-01-04 21:20:48,943] year # 22200, mean reward:   0.0203, sim ret:   0.0008, mkt ret:   0.1612, net:  -0.1604
[2017-01-04 21:21:51,554] year # 22300, mean reward:   0.0336, sim ret:   0.0387, mkt ret:   0.0659, net:  -0.0272
[2017-01-04 21:22:57,809] year # 22400, mean reward:   0.0554, sim ret:  -0.0483, mkt ret:   0.0994, net:  -0.1477
[2017-01-04 21:24:03,555] year # 22500, mean reward:   0.0389, sim ret:   0.0512, mkt ret:   0.0885, net:  -0.0374
[2017-01-04 21:25:08,969] year # 22600, mean reward:   0.0400, sim ret:   0.1163, mkt ret:   0.1696, net:  -0.0534
[2017-01-04 21:26:12,200] year # 22700, mean reward:   0.0410, sim ret:   0.0839, mkt ret:  -0.0529, net:   0.1368
[2017-01-04 21:27:19,013] year # 22800, mean reward:   0.0453, sim ret:  -0.0238, mkt ret:  -0.1961, net:   0.1723
[2017-01-04 21:27:19,411] Congratulations, Warren Buffet!  You won the trading game.

Results

Policy gradients beat the trading game! That said, it doesn't work every time and it seems, looking at the charts below, as though it's a bit of a lucky thing. But luck counts in the trading game as in life!


In [9]:
sf['net'] = sf.simror - sf.mktror
#sf.net.plot()
sf.net.expanding().mean().plot()
sf.net.rolling(100).mean().plot()


Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f15e0066f90>

In [10]:
sf.net.rolling(100).mean().tail()


Out[10]:
24996    0.0
24997    0.0
24998    0.0
24999    0.0
25000    0.0
Name: net, dtype: float64