In [ ]:
from logbook import INFO, WARNING, DEBUG
import warnings
warnings.filterwarnings("ignore") # suppress h5py deprecation warning
import numpy as np
import os
import backtrader as bt
from btgym.research.casual_conv.strategy import CasualConvStrategyMulti
from btgym.research.casual_conv.networks import conv_1d_casual_attention_encoder
from btgym.algorithms.launcher.base import Launcher
from btgym.algorithms.aac import A3C
from btgym.algorithms.policy import StackedLstmPolicy
from btgym import MultiDiscreteEnv
from btgym.datafeed.casual import BTgymCasualDataDomain
from btgym.datafeed.multi import BTgymMultiData
from collections import OrderedDict
Consider setup with one riskless asset acting as broker account cash and K risky assets.
For every risky asset there exists track of historic price records referred as data-line
.
Apart from assets data lines there [optionally] exists number of exogenous data lines holding some
information and statistics, e.g. economic indexes, encoded news, macroeconomic indicators, weather forecasts
etc. which are considered relevant to decision-making.
It is supposed for this setup that:
1. there is no interest rates for any asset;
2. broker actions are fixed-size market orders (`buy`, `sell`, `close`); short selling is permitted;
3. transaction costs are modelled via broker commission;
4. 'market liquidity' and 'capital impact' assumptions are met;
6. time indexes match for all data lines provided;
The problem is modelled as discrete-time finite-horizon partially observable Markov decision process for equity/currency trading:
- *for every asset* traded agent action space is discrete `(0: `hold` [do nothing], 1:`buy`, 2: `sell`, 3:`close` [position])`;
- environment is episodic: maximum episode duration and episode termination conditions
are set;
- for every timestep of the episode agent is given environment state observation as tensor of last
`m` time-embedded preprocessed values for every data-line included and emits actions according some stochastic policy.
- agent's goal is to maximize expected cumulative capital by learning optimal policy;
1. This environment expects Dataset to be instance of `btgym.datafeed.multi.BTgymMultiData`, which sets
number, specifications and sampling synchronisation for historic data for all assets and datalines
one want to consider jointly.
2. Internally every episodic asset data is converted to single bt.feed and added to environment strategy
as separate named data-line (see backtrader docs for extensive explanation of data-lines concept).
Strategy is expected to properly handle all received data-lines.
3. btgym.spaces.ActionDictSpace and order execution. Strategy expects to receive separate action
for every asset in form of dictionary: `{data_line_name_1: action, ..., data_line_name_K: action}`
for K assets added, and issues orders for all assets within a single strategy step.
It is supposed that actions are discrete [for this environment] and same for every asset.
Base actions are set by strategy.params.portfolio_actions, default is: ('hold', 'buy', 'sell', 'close')
which equals to `gym.spaces.Discrete` with depth `N=4 (~number of actions: 0, 1, 2, 3)`.
That is, for `K` assets environment action space will be a shallow dictionary
`(DictSpace)` of discrete spaces:
`{data_line_name_1: gym.spaces.Discrete(N), ..., data_line_name_K: gym.spaces.Discrete(N)}`
Example::
if datalines added via BTgymMultiData are: ['eurchf', 'eurgbp', 'eurgpy', 'eurusd'],
and base asset actions are ['hold', 'buy', 'sell', 'close'], than:
env.action.space will be:
DictSpace(
{
'eurchf': gym.spaces.Discrete(4),
'eurgbp': gym.spaces.Discrete(4),
'eurgpy': gym.spaces.Discrete(4),
'eurusd': gym.spaces.Discrete(4),
}
)
single environment action instance (as seen inside strategy):
{
'eurchf': 'hold',
'eurgbp': 'buy',
'eurgpy': 'hold',
'eurusd': 'close',
}
corresponding action integer encoding as passed to environment via .step():
{
'eurchf': 0,
'eurgbp': 1,
'eurgpy': 0,
'eurusd': 3,
}
vector of integers (categorical):
(0, 1, 0, 3)
4. Environment actions cardinality and encoding. Note that total set of environment actions for `K` assets
and `N` base actions is a `cartesian product of K sets of N elements each`.
It can be encoded as `vector of integers,
single scalar, binary or one_hot`. As cardinality skyrockets with `K`,
`multi-discrete` action setup is only suited
for small number of assets.
Example::
Setup with 4 assets and 4 base actions [hold, buy, sell, close] spawns total of 256 possible
environment actions expressed by single integer in [0, 255] or binary encoding:
vector str : vector int: int: binary:
('hold', 'hold', 'hold', 'hold') -> (0, 0, 0, 0) -> 0 -> 00000000
('hold', 'hold', 'hold', 'buy') -> (0, 0, 0, 1) -> 1 -> 00000001
... ... ...
('close', 'close', 'close', 'sell') -> (3, 3, 3, 2) -> 254 -> 11111110
('close', 'close', 'close', 'close') -> (3, 3, 3, 3) -> 255 -> 11111111
Internally there is some weirdness with encodings as we jump forth and back between
dictionary of names or categorical encodings and binary encoding or one-hot encoding.
As a rule: strategy operates with dictionary of string names of actions, environment
sees action as dictionary of integer numbers while policy estimator operates with
either binary or one-hot encoding.
5. Observation space: is nested DictSpace, where 'external' part part of space
should hold specifications for every data line added.
Example::
if datalines added via BTgymMultiData are:
'eurchf', 'eurgbp', 'eurgpy', 'eurusd';
environment observation space should be DictSpace:
{
'raw': spaces.Box(low=-1000, high=1000, shape=(128, 4), dtype=np.float32),
'external': DictSpace(
{
'eurusd': spaces.Box(low=-1000, high=1000, shape=(128, 1, num_features), dtype=np.float32),
'eurgbp': spaces.Box(low=-1000, high=1000, shape=(128, 1, num_features), dtype=np.float32),
'eurchf': spaces.Box(low=-1000, high=1000, shape=(128, 1, num_features), dtype=np.float32),
'eurgpy': spaces.Box(low=-1000, high=1000, shape=(128, 1, num_features), dtype=np.float32),
}
),
'internal': spaces.Box(...),
'datetime': spaces.Box(...),
'metadata': DictSpace(...)
}
In [ ]:
engine = bt.Cerebro()
num_features = 16
engine.addstrategy(
CasualConvStrategyMulti,
cash_name='EUR', # just a naming for cash asset
start_cash=2000,
commission=0.0001,
leverage=10.0,
asset_names={'USD', 'CHF'}, # that means we use JPY and GBP as information data lines only
drawdown_call=10,
target_call=10,
skip_frame=10,
gamma=0.99,
state_ext_scale = { # strategy specific CWT preprocessing params:
'USD': np.linspace(1, 2, num=num_features),
'GBP': np.linspace(1, 2, num=num_features),
'CHF': np.linspace(1, 2, num=num_features),
'JPY': np.linspace(5e-3, 1e-2, num=num_features),
},
cwt_signal_scale=4e3,
cwt_lower_bound=4.0,
cwt_upper_bound=90.0,
reward_scale=7,
order_size={
'CHF': 1000,
#'GBP': 1000,
#'JPY': 1000,
'USD': 1000,
},
)
data_config = [
('USD', {'filename': './data/DAT_ASCII_EURUSD_M1_2017.csv'}, ),
('GBP', {'filename': './data/DAT_ASCII_EURGBP_M1_2017.csv'}, ),
('JPY', {'filename': './data/DAT_ASCII_EURJPY_M1_2017.csv'}, ),
('CHF', {'filename': './data/DAT_ASCII_EURCHF_M1_2017.csv'}, ),
]
data_config = OrderedDict(data_config)
dataset = BTgymMultiData(
data_class_ref=BTgymCasualDataDomain,
data_config=data_config,
trial_params=dict(
start_weekdays={0, 1, 2, 3, 4, 5, 6},
sample_duration={'days': 30, 'hours': 0, 'minutes': 0},
start_00=False,
time_gap={'days': 15, 'hours': 0},
test_period={'days': 7, 'hours': 0, 'minutes': 0},
expanding=True,
),
episode_params=dict(
start_weekdays={0, 1, 2, 3, 4, 5, 6},
sample_duration={'days': 2, 'hours': 23, 'minutes': 55},
start_00=False,
time_gap={'days': 2, 'hours': 15},
),
frozen_time_split={'year': 2017, 'month': 3, 'day': 1},
)
#########################
env_config = dict(
class_ref=MultiDiscreteEnv,
kwargs=dict(
dataset=dataset,
engine=engine,
render_modes=['episode'],
render_state_as_image=True,
render_size_episode=(12,16),
render_size_human=(9, 4),
render_size_state=(11, 3),
render_dpi=75,
port=5000,
data_port=4999,
connect_timeout=90,
verbose=0,
)
)
cluster_config = dict(
host='127.0.0.1',
port=12230,
num_workers=4, # Set according CPU's available or so
num_ps=1,
num_envs=1,
log_dir=os.path.expanduser('~/tmp/multi_discrete'),
)
policy_config = dict(
class_ref=StackedLstmPolicy,
kwargs={
'lstm_layers': (256, 256),
'dropout_keep_prob': 1.0, # 0.25 - 0.75 opt
'encode_internal_state': False,
'conv_1d_num_filters': 64,
'state_encoder_class_ref': conv_1d_casual_attention_encoder,
}
)
trainer_config = dict(
class_ref=A3C,
kwargs=dict(
opt_learn_rate=1e-4,
opt_end_learn_rate=1e-5,
opt_decay_steps=50*10**6,
model_gamma=0.99,
model_gae_lambda=1.0,
model_beta=0.001, # entropy reg: 0.001 for non_shared encoder_params; 0.05 if 'share_encoder_params'=True
rollout_length=20,
time_flat=True,
model_summary_freq=10,
episode_summary_freq=1,
env_render_freq=5,
)
)
In [ ]:
launcher = Launcher(
cluster_config=cluster_config,
env_config=env_config,
trainer_config=trainer_config,
policy_config=policy_config,
test_mode=False,
max_env_steps=100*10**6,
root_random_seed=0,
purge_previous=1, # ask to override previously saved model and logs
verbose=0
)
# Train it:
launcher.run()
In [ ]: