Setting the environment: full power.

Or: making gym environment happy with your very own backtrader engine.


This example assumes close familarity with Backtrader conceptions and operation worflow.

One should at least run through Quickstart tutorial: https://www.backtrader.com/docu/quickstart/quickstart.html

Typical workfolw for traditional Backtrader backtesting procedure (recap):

  • Define backtrader core engine:
import backtrader as bt
    import backtrader.feeds as btfeeds
    engine = bt.Cerebro()
  • Add some starategy class, wich has been prepared in advance as backtrader base Strategy() subclass and should define decision-making logic:
engine.addstrategy(MyStrategy)
  • Set broker options, such as: cash, commission, slippage, etc.:
engine.setcash(100000)
    engine.setcommission(0.001)
  • Add analyzers, observers, sizers, writers to own needs:
engine.addobserver(bt.observers.Trades)
    engine.addobserver(bt.observers.BuySell)
    engine.addanalyzer(bt.analyzers.DrawDown, _name='drawdown')
    engine.addsizer(bt.sizers.SizerFix, stake=1000)
  • Define and add data feed from one or another source (live feed is possible):
MyData = btfeeds.GenericCSVData(dataname=CSVfilename.csv)
    engine.addata(MyData)
  • Now backtrader enigine is ready to run backtesting:
results = engine.run()
  • After that you can print, analyze and think on results:
engine.plot()
    my_disaster_drowdown = results[0].analyzers.drawdown.get_analysis()

For BTgym, same principles apply with several differences:

  • strategy you prepare will be subclass of base BTgymStrategy, wich contains specific to RL setup methods and parameters;
  • this startegy will not contain buy/sell decision-making logic - this part will go to RL agent;
  • you define you dataset by creating BTgymDataset class instance;
  • you don't add data to your bt.cerebro(). Just pass dataset to environment, BTgym server will do the rest.
  • you dont run backtrader engine manually via run() method, server will do.

There are three levels to BTgym configuration:

Light:

  • use kwargs when making envronment. See 'basic' example for details;

3/4:

  • subclass BTgymStrategy: override get_state(), get_done(), get_reward(), get_info() and [maybe] next() methods to get own state, reward definition, order execution logic, actions etc;
  • pass this strategy to environment via strategy kwarg along with other parameters;
  • [optionally] make instance of BTgymDataset class as your custom dataset an pass it via dataset kwarg.

Full throttle:

  • subclass strategy as in '3/4';
  • define bt.Cerebro(): set broker parameters, add all required observers, analysers, stakes and other bells and whistles;
  • attach your '3/4'-strategy;
  • pass that snowball to environment via engine kwarg;
  • [opt.] make and pass dataset as in '3/4'.

In [ ]:


Environment kwargs reference:

as for v0.0.4

# Dataset parameters:
filename=None,  # Source CSV data file;
    # Episode data params:
start_weekdays=[0, 1, 2, ],  # Only weekdays from the list will be used for episode start.
start_00=True,  # Episode start time will be set to first record of the day (usually 00:00).
episode_duration={'days': 1, 'hours': 23, 'minutes': 55},   # Maximum episode time duration in d:h:m:
time_gap={'hours': 5},  # Maximum data time gap allowed within sample in d:h.
                        # If set < 1 day, samples containing weekends and holidays gaps will be rejected.


# Backtrader engine parameters:
start_cash=10.0,  # initial trading capital.
broker_commission=0.001,  # trade execution commission, default is 0.1% of operation value.
fixed_stake=10,  # single trade stake is fixed type by def.


# Strategy related parameters:
# Observation state shape is dictionary of Gym spaces,
# at least should contain `raw_state` field.
# By convention first dimension of every Gym Box space is time embedding one;
# one can define any shape; should match env.observation_space.shape.
# observation space state min/max values,
# For `raw_state' - absolute min/max values from BTgymDataset will be used.
state_shape=dict(
    raw_state=spaces.Box(
        shape=(10, 4),
        low=-100,
        high=100,
    )
),
drawdown_call=90,  # episode maximum drawdown threshold, default is 90% of initial value.
portfolio_actions=('hold', 'buy', 'sell', 'close'),
    # agent actions,
    # should consist with BTgymStrategy order execution logic;
    # defaults are (env.side): 0 - 'do nothing', 1 - 'buy', 2 - 'sell', 3 - 'close position'.
skip_frame=1,
    # Number of environment steps to skip before returning next response,
    # e.g. if set to 10 -- agent will interact with environment every 10th episode step;
    # Every other step agent's action is assumed to be 'hold'.
    # Note: INFO part of environment response is a list of all skipped frame's info's,
    #       i.e. [info[-9], info[-8], ..., info[0].

# Rendering controls:
render_state_as_image = True
render_state_channel=0
render_size_human = (6, 3.5)
render_size_statet = (7, 3.5)
render_size_episode = (12,8)
render_dpi=75
render_plotstyle = 'seaborn'
render_cmap = 'PRGn'
render_xlabel = 'Relative timesteps'
render_ylabel = 'Value'
render_title = 'step: {}, state observation min: {:.4f}, max: {:.4f}'
render_boxtext = dict(fontsize=12,
                      fontweight='bold',
                      color='w',
                      bbox={'facecolor': 'k', 'alpha': 0.3, 'pad': 3},
                      )

# Other:
port=5500,  # network port to use.
network_address='tcp://127.0.0.1:',  # using localhost.
verbose=0,  # verbosity mode: 0 - silent, 1 - info level, 2 - debugging level (lot of traffic!).

Kwargs applying logic:

    if <engine> kwarg is given:
        do not use default engine and strategy parameters;
        ignore <startegy> kwarg and all startegy and engine-related kwargs;

    else (no <engine>):
        use default engine parameters;
        if any engine-related kwarg is given:
            override corresponding default parameter;

        if <strategy> is given:
            do not use default strategy parameters;
            if any strategy related kwarg is given:
                override corresponding strategy parameter;

        else (no <strategy>):
            use default strategy parameters;
            if any strategy related kwarg is given:
                override corresponding strategy parameter;

   if <dataset> kwarg is given:
        do not use default dataset parameters;
        ignore dataset related kwargs;

    else (no <dataset>):
        use default dataset parameters;
            if  any dataset related kwarg is given:
                override corresponding dataset parameter;

    If any <other> kwarg is given:
        override corr. default parameter.

3/4. 'State and Reward' with BTgymStrategy.

  • There are parameters BTgymStrategy class holds.
  • Point it out: it's strategy parameters, not environment ones (though names are the same as above)!
    # NEW at v0.6: Note that btgym uses new OPenAI Gym space defined in: `gym.spaces.Dict` which is in fact
    # [possibly nested] dictionary of base Gym spaces. You can use `gym.spaces.Dict` if you have 
    # latest Gym version from repo or use equivalent `btgym.spaces.DictSpace` wrapper instead.
    # Thus, `space_shape` param directly translites into Dict space.
    #
    # Observation state shape is dictionary of Gym spaces,
    # at least should contain `raw_state` field.
    # By convention first dimension of every Gym Box space is time embedding one;
    # one can define any shape; should match env.observation_space.shape.
    # observation space state min/max values,
    # For `raw_state' - absolute min/max values from BTgymDataset will be used.
    state_shape=dict(
      raw_state=spaces.Box(
          shape=(10, 4),
          low=-100,
          high=100,
      )
    ),
    drawdown_call=90,  # episode maximum drawdown threshold, default is 90% of initial value.
    portfolio_actions=('hold', 'buy', 'sell', 'close'),
      # agent actions,
      # should consist with BTgymStrategy order execution logic;
      # defaults are (env.side): 0 - 'do nothing', 1 - 'buy', 2 - 'sell', 3 - 'close position'.
    skip_frame=1,
      # Number of environment steps to skip before returning next response,
      # e.g. if set to 10 -- agent will interact with environment every 10th episode step;
      # Every other step agent's action is assumed to be 'hold'.
      # Note: INFO part of environment response is a list of all skipped frame's info's,
      #       i.e. [info[-9], info[-8], ..., info[0].
    
    When maiking own subclass, it's one's responsibility to set those in consistency.

In [ ]:
import sys
sys.path.insert(0,'..')

import IPython.display as Display
import PIL.Image as Image
import numpy as np
import random

from gym import spaces

from btgym import BTgymEnv, BTgymBaseStrategy, BTgymDataset

# Handy functions:

def show_rendered_image(rgb_array):
    """
    Convert numpy array to RGB image using PILLOW and
    show it inline using IPykernel.
    """
    Display.display(Image.fromarray(rgb_array))

def render_all_modes(env):
    """
    Retrieve and show environment renderings
    for all supported modes.
    """
    for mode in env.metadata['render.modes']:
        print('[{}] mode:'.format(mode))
        show_rendered_image(env.render(mode))

def take_some_steps(env, some_steps):
    """Just does it. Acting randomly."""
    for step in range(some_steps):
        rnd_action = env.action_space.sample()
        o, r, d, i = env.step(rnd_action)
        if d:
            print('Episode finished,')
            break
    print(step+1, 'actions made.\n')
    
def under_the_hood(env):
    """Shows environment internals."""
    for attr in ['dataset','strategy','engine','renderer','network_address']:
        print ('\nEnv.{}: {}'.format(attr, getattr(env, attr)))

    for params_name, params_dict in env.params.items():
        print('\nParameters [{}]: '.format(params_name))
        for key, value in params_dict.items():
            print('{} : {}'.format(key,value))

Define simple custom strategy:

Note using of inner startegy variable raw_state.


In [ ]:
class MyStrategy(BTgymBaseStrategy):
    """
    Example subclass of BTgym inner computation startegy,
    overrides default get_state() and get_reward() methods.
    """
    
    def get_price_gradients_state(self):
        """
        This method follows naming cinvention: get_[state_modality_name]_state
        Returns normalized environment observation state
        by computing time-embedded vector
        of price gradients.
        """
        # Prepare:
        sigmoid = lambda x: 1/(1 + np.exp(-x))
        
        # T is 'gamma-like' signal hyperparameter
        # for our signal to be in about [-5,+5] range before passing it to sigmoid;
        # tweak it by hand to add/remove "peaks supressing":
        T = 1.2e+4
        
        # Use default strategy observation variable to get
        # time-embedded state observation as [m,4] numpy matrix, where
        # 4 - number of signal features  == state_shape[-1],
        # m - time-embedding length  == state_shape[0] == <set by user>.
        X = self.raw_state
        
        # ...while iterating, inner _get_raw_state() method is called just before this one,
        # so variable `self.raw_state` is fresh and ready to use.

        # Compute gradients with respect to time-embedding (last) dimension:
        dX = np.gradient(X)[0]
        
        # Squash values in [0,1]:
        return sigmoid(dX * T)
    
    def get_reward(self):
        """
        Computes reward as log utility of current to initial portfolio value ratio.
        """
        return float(np.log(self.stats.broker.value[0] / self.env.broker.startingcash))

Configure environment:

  • All strategy parameters shown above that are not meant to be left defaults should be passed to environmnet as kwargs.
  • when verbose=1, pay attention to log output what classes been used (base or custom).

In [ ]:
# Define dataset:
MyDataset = BTgymDataset(
    filename='../examples/data/DAT_ASCII_EURUSD_M1_2016.csv',
     start_weekdays=[0, 1,],
     # leave all other to defaults,
) 

env = BTgymEnv(
    dataset=MyDataset,
    strategy=MyStrategy,
    state_shape={
       'raw': spaces.Box(low=-10, high=10, shape=(4,4)),  # renered under 'human' name
       'price_gradients': spaces.Box(low=0, high=1, shape=(4,4))
    },
    drawdown_call=30,
    skip_frame=5,
    # use default agent actions,
    # use default engine,
    start_cash=100.0,
    # use default commission,
    # use default stake,
    # use default network port,
    render_modes=['episode', 'human', 'price_gradients'],
    render_state_as_image = False,
    render_ylabel = 'Price Gradient',
    # leave other rendering p. to dedaults,
    verbose=1,
)

Take a look...


In [ ]:
under_the_hood(env)

Time to run:

  • Play with number of steps. Comment out env.reset() not to restart episode every time you run th cell.
  • Refer to 'rendering howto' to get sense of how renerings are updated.

In [ ]:
env.reset()
take_some_steps(env, 100)
render_all_modes(env)

Full Throttle setup:

  • Summon Backtrader power;
  • Wich-is-what: pay attention to arguments being used or ignored.

In [ ]:
# Clean up:
env.close()

# Now we need it:
import backtrader as bt

In [ ]:
# Define dataset:
MyDataset = BTgymDataset(
    filename='../examples/data/DAT_ASCII_EURUSD_M1_2016.csv',
    start_weekdays=[0, 1,],
    episode_duration={'days': 2, 'hours': 23, 'minutes': 55},  # episode duration set to about 3 days (2:23:55),
    # leave all other to defaults,
) 


# Configure backtesting engine:
MyCerebro = bt.Cerebro()

# Note (again): all kwargs here will go stright to strategy parameters dict,
# that is our responsibility to consisit observation shape / bounds with what our get_state() computes.
MyCerebro.addstrategy(
    MyStrategy,
    state_shape={
        'raw': spaces.Box(low=-10, high=10, shape=(4,4)),
        'price_gradients': spaces.Box(low=0, high=1, shape=(4,4))
    },
    drawdown_call=99,
    skip_frame=5,
)

# Than everything is very backtrader'esque:
MyCerebro.broker.setcash(100.0)
MyCerebro.broker.setcommission(commission=0.002)
MyCerebro.addsizer(bt.sizers.SizerFix, stake=20)
MyCerebro.addanalyzer(bt.analyzers.DrawDown)

# Finally:
env = BTgymEnv(
    dataset=MyDataset,
    episode_duration={'days': 0, 'hours': 5, 'minutes': 55}, # ignored!
    engine=MyCerebro,
    strategy='NotUsed',  # ignored!
    state_shape=(9, 99), # ignored!
    start_cash=1.0,  # ignored!
    render_modes=['episode', 'human', 'price_gradients'],
    render_state_as_image=True,
    render_ylabel='Price Gradient',   
    render_size_human=(10,4),
    render_size_state=(10,4),
    render_plotstyle='ggplot',
    verbose=0,
)

# Look again...
under_the_hood(env)

In [ ]:
env.reset()
take_some_steps(env, 100)
render_all_modes(env)

In [ ]:
# Clean up:
env.close()

In [ ]: