Setting the environment: full power.

Or: making gym environment happy with your very own backtrader engine.

This example assumes close familarity with Backtrader conceptions and operation worflow.

One should at least run through Quickstart tutorial: https://www.backtrader.com/docu/quickstart/quickstart.html

Typical workfolw for traditional Backtrader backtesting procedure (recap):

Define backtrader core engine:

import backtrader as bt
    import backtrader.feeds as btfeeds
    engine = bt.Cerebro()

Add some starategy class, wich has been prepared in advance as backtrader base Strategy() subclass and should define decision-making logic:

engine.addstrategy(MyStrategy)

Set broker options, such as: cash, commission, slippage, etc.:

engine.setcash(100000)
    engine.setcommission(0.001)

Add analyzers, observers, sizers, writers to own needs:

engine.addobserver(bt.observers.Trades)
    engine.addobserver(bt.observers.BuySell)
    engine.addanalyzer(bt.analyzers.DrawDown, _name='drawdown')
    engine.addsizer(bt.sizers.SizerFix, stake=1000)

Define and add data feed from one or another source (live feed is possible):

MyData = btfeeds.GenericCSVData(dataname=CSVfilename.csv)
    engine.addata(MyData)

Now backtrader enigine is ready to run backtesting:

results = engine.run()

After that you can print, analyze and think on results:

engine.plot()
    my_disaster_drowdown = results[0].analyzers.drawdown.get_analysis()

For BTgym, same principles apply with several differences:

strategy you prepare will be subclass of base BTgymStrategy, wich contains specific to RL setup methods and parameters;

this startegy will not contain buy/sell decision-making logic - this part will go to RL agent;

you define you dataset by creating BTgymDataset class instance;

you don't add data to your bt.cerebro(). Just pass dataset to environment, BTgym server will do the rest.

you dont run backtrader engine manually via run() method, server will do.

There are three levels to BTgym configuration:

Light:

use kwargs when making envronment. See 'basic' example for details;

3/4:

subclass BTgymStrategy: override get_state(), get_done(), get_reward(), get_info() and [maybe] next() methods to get own state, reward definition, order execution logic, actions etc;
pass this strategy to environment via strategy kwarg along with other parameters;
[optionally] make instance of BTgymDataset class as your custom dataset an pass it via dataset kwarg.

Full throttle:

subclass strategy as in '3/4';
define bt.Cerebro(): set broker parameters, add all required observers, analysers, stakes and other bells and whistles;
attach your '3/4'-strategy;
pass that snowball to environment via engine kwarg;
[opt.] make and pass dataset as in '3/4'.



In [ ]:

Environment kwargs reference:

as for v0.0.4

# Dataset parameters:
filename=None,  # Source CSV data file;
    # Episode data params:
start_weekdays=[0, 1, 2, ],  # Only weekdays from the list will be used for episode start.
start_00=True,  # Episode start time will be set to first record of the day (usually 00:00).
episode_duration={'days': 1, 'hours': 23, 'minutes': 55},   # Maximum episode time duration in d:h:m:
time_gap={'hours': 5},  # Maximum data time gap allowed within sample in d:h.
                        # If set < 1 day, samples containing weekends and holidays gaps will be rejected.


# Backtrader engine parameters:
start_cash=10.0,  # initial trading capital.
broker_commission=0.001,  # trade execution commission, default is 0.1% of operation value.
fixed_stake=10,  # single trade stake is fixed type by def.


# Strategy related parameters:
# Observation state shape is dictionary of Gym spaces,
# at least should contain `raw_state` field.
# By convention first dimension of every Gym Box space is time embedding one;
# one can define any shape; should match env.observation_space.shape.
# observation space state min/max values,
# For `raw_state' - absolute min/max values from BTgymDataset will be used.
state_shape=dict(
    raw_state=spaces.Box(
        shape=(10, 4),
        low=-100,
        high=100,
    )
),
drawdown_call=90,  # episode maximum drawdown threshold, default is 90% of initial value.
portfolio_actions=('hold', 'buy', 'sell', 'close'),
    # agent actions,
    # should consist with BTgymStrategy order execution logic;
    # defaults are (env.side): 0 - 'do nothing', 1 - 'buy', 2 - 'sell', 3 - 'close position'.
skip_frame=1,
    # Number of environment steps to skip before returning next response,
    # e.g. if set to 10 -- agent will interact with environment every 10th episode step;
    # Every other step agent's action is assumed to be 'hold'.
    # Note: INFO part of environment response is a list of all skipped frame's info's,
    #       i.e. [info[-9], info[-8], ..., info[0].

# Rendering controls:
render_state_as_image = True
render_state_channel=0
render_size_human = (6, 3.5)
render_size_statet = (7, 3.5)
render_size_episode = (12,8)
render_dpi=75
render_plotstyle = 'seaborn'
render_cmap = 'PRGn'
render_xlabel = 'Relative timesteps'
render_ylabel = 'Value'
render_title = 'step: {}, state observation min: {:.4f}, max: {:.4f}'
render_boxtext = dict(fontsize=12,
                      fontweight='bold',
                      color='w',
                      bbox={'facecolor': 'k', 'alpha': 0.3, 'pad': 3},
                      )

# Other:
port=5500,  # network port to use.
network_address='tcp://127.0.0.1:',  # using localhost.
verbose=0,  # verbosity mode: 0 - silent, 1 - info level, 2 - debugging level (lot of traffic!).

Kwargs applying logic:

    if <engine> kwarg is given:
        do not use default engine and strategy parameters;
        ignore <startegy> kwarg and all startegy and engine-related kwargs;

    else (no <engine>):
        use default engine parameters;
        if any engine-related kwarg is given:
            override corresponding default parameter;

        if <strategy> is given:
            do not use default strategy parameters;
            if any strategy related kwarg is given:
                override corresponding strategy parameter;

        else (no <strategy>):
            use default strategy parameters;
            if any strategy related kwarg is given:
                override corresponding strategy parameter;

   if <dataset> kwarg is given:
        do not use default dataset parameters;
        ignore dataset related kwargs;

    else (no <dataset>):
        use default dataset parameters;
            if  any dataset related kwarg is given:
                override corresponding dataset parameter;

    If any <other> kwarg is given:
        override corr. default parameter.

3/4. 'State and Reward' with BTgymStrategy.

There are parameters BTgymStrategy class holds.

Point it out: it's strategy parameters, not environment ones (though names are the same as above)!

# NEW at v0.6: Note that btgym uses new OPenAI Gym space defined in: `gym.spaces.Dict` which is in fact
# [possibly nested] dictionary of base Gym spaces. You can use `gym.spaces.Dict` if you have 
# latest Gym version from repo or use equivalent `btgym.spaces.DictSpace` wrapper instead.
# Thus, `space_shape` param directly translites into Dict space.
#
# Observation state shape is dictionary of Gym spaces,
# at least should contain `raw_state` field.
# By convention first dimension of every Gym Box space is time embedding one;
# one can define any shape; should match env.observation_space.shape.
# observation space state min/max values,
# For `raw_state' - absolute min/max values from BTgymDataset will be used.
state_shape=dict(
  raw_state=spaces.Box(
      shape=(10, 4),
      low=-100,
      high=100,
  )
),
drawdown_call=90,  # episode maximum drawdown threshold, default is 90% of initial value.
portfolio_actions=('hold', 'buy', 'sell', 'close'),
  # agent actions,
  # should consist with BTgymStrategy order execution logic;
  # defaults are (env.side): 0 - 'do nothing', 1 - 'buy', 2 - 'sell', 3 - 'close position'.
skip_frame=1,
  # Number of environment steps to skip before returning next response,
  # e.g. if set to 10 -- agent will interact with environment every 10th episode step;
  # Every other step agent's action is assumed to be 'hold'.
  # Note: INFO part of environment response is a list of all skipped frame's info's,
  #       i.e. [info[-9], info[-8], ..., info[0].

When maiking own subclass, it's one's responsibility to set those in consistency.



In [ ]:

    
import sys
sys.path.insert(0,'..')

import IPython.display as Display
import PIL.Image as Image
import numpy as np
import random

from gym import spaces

from btgym import BTgymEnv, BTgymBaseStrategy, BTgymDataset

# Handy functions:

def show_rendered_image(rgb_array):
    """
    Convert numpy array to RGB image using PILLOW and
    show it inline using IPykernel.
    """
    Display.display(Image.fromarray(rgb_array))

def render_all_modes(env):
    """
    Retrieve and show environment renderings
    for all supported modes.
    """
    for mode in env.metadata['render.modes']:
        print('[{}] mode:'.format(mode))
        show_rendered_image(env.render(mode))

def take_some_steps(env, some_steps):
    """Just does it. Acting randomly."""
    for step in range(some_steps):
        rnd_action = env.action_space.sample()
        o, r, d, i = env.step(rnd_action)
        if d:
            print('Episode finished,')
            break
    print(step+1, 'actions made.\n')
    
def under_the_hood(env):
    """Shows environment internals."""
    for attr in ['dataset','strategy','engine','renderer','network_address']:
        print ('\nEnv.{}: {}'.format(attr, getattr(env, attr)))

    for params_name, params_dict in env.params.items():
        print('\nParameters [{}]: '.format(params_name))
        for key, value in params_dict.items():
            print('{} : {}'.format(key,value))

Define simple custom strategy:

Note using of inner startegy variable raw_state.



In [ ]:

    
class MyStrategy(BTgymBaseStrategy):
    """
    Example subclass of BTgym inner computation startegy,
    overrides default get_state() and get_reward() methods.
    """
    
    def get_price_gradients_state(self):
        """
        This method follows naming cinvention: get_[state_modality_name]_state
        Returns normalized environment observation state
        by computing time-embedded vector
        of price gradients.
        """
        # Prepare:
        sigmoid = lambda x: 1/(1 + np.exp(-x))
        
        # T is 'gamma-like' signal hyperparameter
        # for our signal to be in about [-5,+5] range before passing it to sigmoid;
        # tweak it by hand to add/remove "peaks supressing":
        T = 1.2e+4
        
        # Use default strategy observation variable to get
        # time-embedded state observation as [m,4] numpy matrix, where
        # 4 - number of signal features  == state_shape[-1],
        # m - time-embedding length  == state_shape[0] == <set by user>.
        X = self.raw_state
        
        # ...while iterating, inner _get_raw_state() method is called just before this one,
        # so variable `self.raw_state` is fresh and ready to use.

        # Compute gradients with respect to time-embedding (last) dimension:
        dX = np.gradient(X)[0]
        
        # Squash values in [0,1]:
        return sigmoid(dX * T)
    
    def get_reward(self):
        """
        Computes reward as log utility of current to initial portfolio value ratio.
        """
        return float(np.log(self.stats.broker.value[0] / self.env.broker.startingcash))

Configure environment:

All strategy parameters shown above that are not meant to be left defaults should be passed to environmnet as kwargs.
when verbose=1, pay attention to log output what classes been used (base or custom).



In [ ]:

    
# Define dataset:
MyDataset = BTgymDataset(
    filename='../examples/data/DAT_ASCII_EURUSD_M1_2016.csv',
     start_weekdays=[0, 1,],
     # leave all other to defaults,
) 

env = BTgymEnv(
    dataset=MyDataset,
    strategy=MyStrategy,
    state_shape={
       'raw': spaces.Box(low=-10, high=10, shape=(4,4)),  # renered under 'human' name
       'price_gradients': spaces.Box(low=0, high=1, shape=(4,4))
    },
    drawdown_call=30,
    skip_frame=5,
    # use default agent actions,
    # use default engine,
    start_cash=100.0,
    # use default commission,
    # use default stake,
    # use default network port,
    render_modes=['episode', 'human', 'price_gradients'],
    render_state_as_image = False,
    render_ylabel = 'Price Gradient',
    # leave other rendering p. to dedaults,
    verbose=1,
)

Take a look...



In [ ]:

    
under_the_hood(env)

Time to run:

Play with number of steps. Comment out env.reset() not to restart episode every time you run th cell.
Refer to 'rendering howto' to get sense of how renerings are updated.



In [ ]:

    
env.reset()
take_some_steps(env, 100)
render_all_modes(env)

Full Throttle setup:

Summon Backtrader power;
Wich-is-what: pay attention to arguments being used or ignored.



In [ ]:

    
# Clean up:
env.close()

# Now we need it:
import backtrader as bt



In [ ]:

    
# Define dataset:
MyDataset = BTgymDataset(
    filename='../examples/data/DAT_ASCII_EURUSD_M1_2016.csv',
    start_weekdays=[0, 1,],
    episode_duration={'days': 2, 'hours': 23, 'minutes': 55},  # episode duration set to about 3 days (2:23:55),
    # leave all other to defaults,
) 


# Configure backtesting engine:
MyCerebro = bt.Cerebro()

# Note (again): all kwargs here will go stright to strategy parameters dict,
# that is our responsibility to consisit observation shape / bounds with what our get_state() computes.
MyCerebro.addstrategy(
    MyStrategy,
    state_shape={
        'raw': spaces.Box(low=-10, high=10, shape=(4,4)),
        'price_gradients': spaces.Box(low=0, high=1, shape=(4,4))
    },
    drawdown_call=99,
    skip_frame=5,
)

# Than everything is very backtrader'esque:
MyCerebro.broker.setcash(100.0)
MyCerebro.broker.setcommission(commission=0.002)
MyCerebro.addsizer(bt.sizers.SizerFix, stake=20)
MyCerebro.addanalyzer(bt.analyzers.DrawDown)

# Finally:
env = BTgymEnv(
    dataset=MyDataset,
    episode_duration={'days': 0, 'hours': 5, 'minutes': 55}, # ignored!
    engine=MyCerebro,
    strategy='NotUsed',  # ignored!
    state_shape=(9, 99), # ignored!
    start_cash=1.0,  # ignored!
    render_modes=['episode', 'human', 'price_gradients'],
    render_state_as_image=True,
    render_ylabel='Price Gradient',   
    render_size_human=(10,4),
    render_size_state=(10,4),
    render_plotstyle='ggplot',
    verbose=0,
)

# Look again...
under_the_hood(env)



In [ ]:

    
env.reset()
take_some_steps(env, 100)
render_all_modes(env)



In [ ]:

    
# Clean up:
env.close()



In [ ]: