Baseline DQN Gym_Trading Tutorial



In [1]:
import pandas as pd
import gym_trading
import gym
import sys
import itertools
import numpy as np
import tensorflow as tf
import tensorflow.contrib.layers as layers

import baselines.common.tf_util as U

from baselines import deepq
from baselines.deepq.replay_buffer import ReplayBuffer
from baselines.common.schedules import LinearSchedule

Trading Framework

This framework is developed based on Peter Henry https://github.com/Henry-bee/gym_trading/ which in turn on developed of Tito Ingargiola's https://github.com/hackthemarket/gym-trading.

First, define the address for the CSV data


In [2]:
csv = "data/EURUSD60.csv"

Create a new OpenAI Gym environment with the customised Trading environment

.initialise_simulator() must be invoked after env.make('trading-v0') . Within this function, provide these arguments:

  • csv_name: Address of the data
  • trade_period: (int), Max of duration of each trades. Default: 1000
  • train_split: (float), Percentage of data set for training. Default: 0.7

In [3]:
env = gym.make('trading-v0')
env.initialise_simulator(csv, trade_period=50, train_split=0.7)


[2017-07-09 18:39:23,128] Making new env: trading-v0
                       Return       ATR  Open Trade  Duration Trade
Date_Time                                                          
2013-12-02 02:00:00  0.421251  0.355142         0.0             0.0
/home/adrian/.local/lib/python3.5/site-packages/matplotlib/cbook.py:136: MatplotlibDeprecationWarning: The finance module has been deprecated in mpl 2.0 and will be removed in mpl 2.2. Please use the module mpl_finance instead.
  warnings.warn(message, mplDeprecation, stacklevel=1)

States map

states_map is a discretized observation space bounded by the extreme values of features with an interval of 0.5. Also I use trade duration and boolean of active trade. This observations are dinamical, while the algorithm runs.


In [4]:
env.sim.states


Out[4]:
array([[ 0.42125144,  0.35514246,  0.        ,  0.        ],
       [ 1.7094119 ,  0.45689009,  0.        ,  0.        ],
       [-0.59190109,  0.45925152,  0.        ,  0.        ],
       ..., 
       [-1.00049976, -0.45976391,  0.        ,  0.        ],
       [-0.20124835, -0.51999441,  0.        ,  0.        ],
       [ 0.31455146, -0.61636501,  0.        ,  0.        ]])

The magic (Deep Q-Network)

The point of Baselines OpenAI is set of high-quality implementations of reinforcement learning algorithms. A lot of projects for reinforcement trading uses their own implementations, causing small bugs or hard to maintenance/improvement.

There are a lot of good resources to drill down on this topic. But well above, the core of Q_learning and DQNs can express with the next diagrams

Learning resources:

http://karpathy.github.io/2016/05/31/rl/

http://minpy.readthedocs.io/en/latest/tutorial/rl_policy_gradient_tutorial/rl_policy_gradient.html

http://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html

http://kvfrans.com/simple-algoritms-for-solving-cartpole/

https://medium.com/@awjuliani/super-simple-reinforcement-learning-tutorial-part-1-fd544fab149

https://dataorigami.net/blogs/napkin-folding/79031811-multi-armed-bandits

Set the model

So, let's get our hands dirty. First set our network


In [5]:
def model(inpt, num_actions, scope, reuse=False):
    """This model takes as input an observation and returns values of all actions."""
    with tf.variable_scope(scope, reuse=reuse):
        out = inpt
        out = layers.fully_connected(out, num_outputs=128, activation_fn=tf.nn.tanh)
        out = layers.fully_connected(out, num_outputs=64, activation_fn=tf.nn.tanh)
        out = layers.fully_connected(out, num_outputs=32, activation_fn=tf.nn.tanh)
        out = layers.fully_connected(out, num_outputs=num_actions, activation_fn=None)
        return out

And define run_test function to use in the end of every episode


In [6]:
def run_test(env, act, episodes=1, final_test=False):
    obs = env._reset(train=False)
    start = env.sim.train_end_index + 1
    end = env.sim.count - 2

    for episode in range(episodes):
        done = False
        while done is False:
            action = act(obs[None])
            obs, reward, done, info = env.step(action)

        if not final_test:
            journal = pd.DataFrame(env.portfolio.journal)
            profit = journal["Profit"].sum()
            return env.portfolio.average_profit_per_trade, profit
        else:
            print("Training period  %s - %s" % (env.sim.date_time[start], env.sim.date_time[end]))
            print("Average Reward is %s" % (env.portfolio.average_profit_per_trade))

    if final_test:
        env._generate_summary_stats()

Running the enviroment!

At this point, we can start up the enviroment and run the episodes. The most important is:

  • Set the episode_rewards with the reward that we want. For example if we want maximice each trade: episode_rewards[-1] += rew
  • Set the solved function. The training will stop when the outcome of function get True. For example: is_solved = np.mean(episode_rewards[-101:-1]) > 1000 or t == 100000
  • Instanciate deepq.build_train. That is the core of Baseline of OpenAI.

build_train Creates the train function:

Parameters

  • make_obs_ph: str -> tf.placeholder or TfInput -> a function that takes a name and creates a placeholder of input with that name

  • q_func: (tf.Variable, int, str, bool) -> tf.Variable -> the model that takes the following inputs:

    • observation_in: object -> the output of observation placeholder
    • num_actions: int -> number of actions
    • scope: str
    • reuse: bool -> should be passed to outer variable scope and returns a tensor of shape (batch_size, num_actions) with values of every action.
  • num_actions: int -> number of actions

  • reuse: bool -> whether or not to reuse the graph variables

  • optimizer: tf.train.Optimizer -> optimizer to use for the Q-learning objective.

  • grad_norm_clipping: float or None -> clip gradient norms to this value. If None no clipping is performed.

  • gamma: float -> discount rate.

  • double_q: bool -> if true will use Double Q Learning (https://arxiv.org/abs/1509.06461).In general it is a good idea to keep it enabled.

  • scope: str or VariableScope -> optional scope for variable_scope.

  • reuse: bool or None -> whether or not the variables should be reused. To be able to reuse the scope must be given.

Returns

  • act: (tf.Variable, bool, float) -> tf.Variable -> function to select and action given observation.

  • train: (object, np.array, np.array, object, np.array, np.array) -> np.array -> optimize the error in Bellman's equation.See the top of the file for details.

  • update_target: () -> () -> copy the parameters from optimized Q function to the target Q function. debug: {str: function} -> a bunch of functions to print debug data like q_values.


In [7]:
with U.make_session(8):
    
    act, train, update_target, debug = deepq.build_train(
        make_obs_ph=lambda name: U.BatchInput(env.observation_space.shape, name=name),
        q_func=model,
        num_actions=env.action_space.n,
        optimizer=tf.train.AdamOptimizer(learning_rate=5e-4),
    )

    replay_buffer = ReplayBuffer(50000)
    # Create the schedule for exploration starting from 1 (every action is random) down to
    # 0.02 (98% of actions are selected according to values predicted by the model).
    exploration = LinearSchedule(schedule_timesteps=10000, initial_p=1.0, final_p=0.02)
    # Initialize the parameters and copy them to the target network.
    U.initialize()
    update_target()

    episode_rewards = [0.0]
    obs = env.reset()
    l_mean_episode_reward = []
    for t in itertools.count():
        # Take action and update exploration to the newest value
        action = act(obs[None], update_eps=exploration.value(t))[0]

        new_obs, rew, done, _ = env.step(action)

        # Store transition in the replay buffer.
        replay_buffer.add(obs, action, rew, new_obs, float(done))

        obs = new_obs

        episode_rewards[-1] += rew

        is_solved = np.mean(episode_rewards[-101:-1]) > 500 or t >= 300000
        is_solved = is_solved and len(env.portfolio.journal) > 2
        
        if done:

            journal = pd.DataFrame(env.portfolio.journal)
            profit = journal["Profit"].sum()

            try:
                print("-------------------------------------")
                print("steps                     | {:}".format(t))
                print("episodes                  | {}".format(len(episode_rewards)))
                print("% time spent exploring    | {}".format(int(100 * exploration.value(t))))

                print("--")
                l_mean_episode_reward.append(round(np.mean(episode_rewards[-101:-1]), 1))

                print("mean episode reward       | {:}".format(l_mean_episode_reward[-1]))
                print("Total operations          | {}".format(len(env.portfolio.journal)))
                print("Avg duration trades       | {}".format(round(journal["Trade Duration"].mean(), 2)))
                print("Total profit              | {}".format(round(profit), 1))
                print("Avg profit per trade      | {}".format(round(env.portfolio.average_profit_per_trade, 3)))

                print("--")

                reward_test, profit = run_test(env=env, act=act)
                print("Total profit test:        > {}".format(round(profit, 2)))
                print("Avg profit per trade test > {}".format(round(reward_test, 3)))
                print("-------------------------------------")
            except Exception as e:
                print("Exception: ", e)
                # Update target network periodically.

            obs = env.reset()
            episode_rewards.append(0)



        if is_solved:
            # Show off the result
            env._generate_summary_stats()
            run_test(env, act, final_test=True)
            break

        else:
            # Minimize the error in Bellman's equation on a batch sampled from replay buffer.
            if t > 500:
                obses_t, actions, rewards, obses_tp1, dones = replay_buffer.sample(32)
                train(obses_t, actions, rewards, obses_tp1, dones, np.ones_like(rewards))
            if t % 500 == 0:
                update_target()


WARNING:tensorflow:VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
[2017-07-09 18:39:23,452] VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
WARNING:tensorflow:VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
[2017-07-09 18:39:23,474] VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
/home/adrian/.local/lib/python3.5/site-packages/numpy/core/fromnumeric.py:2909: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/home/adrian/.local/lib/python3.5/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
-------------------------------------
steps                     | 2138
episodes                  | 1
% time spent exploring    | 79
--
mean episode reward       | nan
Total operations          | 391
Avg duration trades       | 3.07
Total profit              | -960.0
Avg profit per trade      | -2.872
--
Total profit test:        > -229.6
Avg profit per trade test > -2.031
-------------------------------------
-------------------------------------
steps                     | 4277
episodes                  | 2
% time spent exploring    | 58
--
mean episode reward       | -4001.2
Total operations          | 430
Avg duration trades       | 2.02
Total profit              | -1238.0
Avg profit per trade      | -2.16
--
Total profit test:        > -313.4
Avg profit per trade test > -2.581
-------------------------------------
-------------------------------------
steps                     | 6416
episodes                  | 3
% time spent exploring    | 37
--
mean episode reward       | -3412.6
Total operations          | 379
Avg duration trades       | 1.54
Total profit              | -1294.0
Avg profit per trade      | -3.681
--
Total profit test:        > -273.0
Avg profit per trade test > -2.696
-------------------------------------
-------------------------------------
steps                     | 8555
episodes                  | 4
% time spent exploring    | 16
--
mean episode reward       | -3157.4
Total operations          | 219
Avg duration trades       | 1.32
Total profit              | -998.0
Avg profit per trade      | -4.135
--
Total profit test:        > -229.3
Avg profit per trade test > -3.618
-------------------------------------
-------------------------------------
steps                     | 10694
episodes                  | 5
% time spent exploring    | 2
--
mean episode reward       | -2637.3
Total operations          | 72
Avg duration trades       | 1.08
Total profit              | -359.0
Avg profit per trade      | -4.692
--
Total profit test:        > -144.8
Avg profit per trade test > -7.225
-------------------------------------
-------------------------------------
steps                     | 12833
episodes                  | 6
% time spent exploring    | 2
--
mean episode reward       | -2166.0
Total operations          | 25
Avg duration trades       | 1.04
Total profit              | 2.0
Avg profit per trade      | -2.008
--
Total profit test:        > -41.7
Avg profit per trade test > -3.878
-------------------------------------
-------------------------------------
steps                     | 14972
episodes                  | 7
% time spent exploring    | 2
--
mean episode reward       | -1814.0
Total operations          | 26
Avg duration trades       | 1.04
Total profit              | -73.0
Avg profit per trade      | -2.515
--
Total profit test:        > -37.9
Avg profit per trade test > -1.188
-------------------------------------
-------------------------------------
steps                     | 17111
episodes                  | 8
% time spent exploring    | 2
--
mean episode reward       | -1558.6
Total operations          | 29
Avg duration trades       | 1.1
Total profit              | -165.0
Avg profit per trade      | -5.676
--
Total profit test:        > -5.5
Avg profit per trade test > -1.233
-------------------------------------
-------------------------------------
steps                     | 19250
episodes                  | 9
% time spent exploring    | 2
--
mean episode reward       | -1396.0
Total operations          | 19
Avg duration trades       | 1.0
Total profit              | -61.0
Avg profit per trade      | 0.189
--
Total profit test:        > -96.3
Avg profit per trade test > -4.577
-------------------------------------
-------------------------------------
steps                     | 21389
episodes                  | 10
% time spent exploring    | 2
--
mean episode reward       | -1240.5
Total operations          | 24
Avg duration trades       | 1.04
Total profit              | -30.0
Avg profit per trade      | -3.242
--
Total profit test:        > -173.1
Avg profit per trade test > -3.732
-------------------------------------
-------------------------------------
steps                     | 23528
episodes                  | 11
% time spent exploring    | 2
--
mean episode reward       | -1122.5
Total operations          | 41
Avg duration trades       | 1.05
Total profit              | 25.0
Avg profit per trade      | -0.298
--
Total profit test:        > -155.2
Avg profit per trade test > -1.946
-------------------------------------
-------------------------------------
steps                     | 25667
episodes                  | 12
% time spent exploring    | 2
--
mean episode reward       | -1019.3
Total operations          | 55
Avg duration trades       | 1.11
Total profit              | -259.0
Avg profit per trade      | -3.622
--
Total profit test:        > -315.8
Avg profit per trade test > -2.511
-------------------------------------
-------------------------------------
steps                     | 27806
episodes                  | 13
% time spent exploring    | 2
--
mean episode reward       | -951.7
Total operations          | 33
Avg duration trades       | 1.06
Total profit              | -28.0
Avg profit per trade      | -1.212
--
Total profit test:        > -64.3
Avg profit per trade test > -1.829
-------------------------------------
-------------------------------------
steps                     | 29945
episodes                  | 14
% time spent exploring    | 2
--
mean episode reward       | -881.3
Total operations          | 29
Avg duration trades       | 1.1
Total profit              | 159.0
Avg profit per trade      | 1.659
--
Total profit test:        > -72.9
Avg profit per trade test > -2.112
-------------------------------------
-------------------------------------
steps                     | 32084
episodes                  | 15
% time spent exploring    | 2
--
mean episode reward       | -816.4
Total operations          | 47
Avg duration trades       | 1.02
Total profit              | -27.0
Avg profit per trade      | -1.366
--
Total profit test:        > -48.8
Avg profit per trade test > -2.744
-------------------------------------
-------------------------------------
steps                     | 34223
episodes                  | 16
% time spent exploring    | 2
--
mean episode reward       | -765.1
Total operations          | 43
Avg duration trades       | 1.09
Total profit              | -10.0
Avg profit per trade      | -0.905
--
Total profit test:        > -245.7
Avg profit per trade test > -2.414
-------------------------------------
-------------------------------------
steps                     | 36362
episodes                  | 17
% time spent exploring    | 2
--
mean episode reward       | -720.5
Total operations          | 42
Avg duration trades       | 1.12
Total profit              | -100.0
Avg profit per trade      | -3.871
--
Total profit test:        > -72.0
Avg profit per trade test > -0.417
-------------------------------------
-------------------------------------
steps                     | 38501
episodes                  | 18
% time spent exploring    | 2
--
mean episode reward       | -689.3
Total operations          | 32
Avg duration trades       | 1.03
Total profit              | -3.0
Avg profit per trade      | -0.553
--
Total profit test:        > -183.9
Avg profit per trade test > -2.791
-------------------------------------
-------------------------------------
steps                     | 40640
episodes                  | 19
% time spent exploring    | 2
--
mean episode reward       | -651.0
Total operations          | 24
Avg duration trades       | 1.04
Total profit              | 79.0
Avg profit per trade      | 1.488
--
Total profit test:        > -99.8
Avg profit per trade test > -1.883
-------------------------------------
-------------------------------------
steps                     | 42779
episodes                  | 20
% time spent exploring    | 2
--
mean episode reward       | -613.9
Total operations          | 27
Avg duration trades       | 1.07
Total profit              | 65.0
Avg profit per trade      | -0.622
--
Total profit test:        > -125.3
Avg profit per trade test > -0.846
-------------------------------------
-------------------------------------
steps                     | 44918
episodes                  | 21
% time spent exploring    | 2
--
mean episode reward       | -583.0
Total operations          | 34
Avg duration trades       | 1.03
Total profit              | -70.0
Avg profit per trade      | -2.324
--
Total profit test:        > -50.9
Avg profit per trade test > -1.518
-------------------------------------
-------------------------------------
steps                     | 47057
episodes                  | 22
% time spent exploring    | 2
--
mean episode reward       | -558.2
Total operations          | 28
Avg duration trades       | 1.07
Total profit              | -57.0
Avg profit per trade      | -3.571
--
Total profit test:        > -145.2
Avg profit per trade test > -3.628
-------------------------------------
-------------------------------------
steps                     | 49196
episodes                  | 23
% time spent exploring    | 2
--
mean episode reward       | -538.3
Total operations          | 21
Avg duration trades       | 1.1
Total profit              | 29.0
Avg profit per trade      | -1.614
--
Total profit test:        > -111.5
Avg profit per trade test > -3.698
-------------------------------------
-------------------------------------
steps                     | 51335
episodes                  | 24
% time spent exploring    | 2
--
mean episode reward       | -515.7
Total operations          | 23
Avg duration trades       | 1.04
Total profit              | 23.0
Avg profit per trade      | 2.03
--
Total profit test:        > 57.3
Avg profit per trade test > -1.155
-------------------------------------
-------------------------------------
steps                     | 53474
episodes                  | 25
% time spent exploring    | 2
--
mean episode reward       | -491.6
Total operations          | 24
Avg duration trades       | 1.0
Total profit              | 66.0
Avg profit per trade      | 0.617
--
Total profit test:        > -32.7
Avg profit per trade test > -1.377
-------------------------------------
-------------------------------------
steps                     | 55613
episodes                  | 26
% time spent exploring    | 2
--
mean episode reward       | -471.3
Total operations          | 34
Avg duration trades       | 1.12
Total profit              | -7.0
Avg profit per trade      | -2.132
--
Total profit test:        > -70.0
Avg profit per trade test > -3.566
-------------------------------------
-------------------------------------
steps                     | 57752
episodes                  | 27
% time spent exploring    | 2
--
mean episode reward       | -456.0
Total operations          | 34
Avg duration trades       | 1.09
Total profit              | -31.0
Avg profit per trade      | -3.224
--
Total profit test:        > -197.5
Avg profit per trade test > -2.964
-------------------------------------
-------------------------------------
steps                     | 59891
episodes                  | 28
% time spent exploring    | 2
--
mean episode reward       | -442.5
Total operations          | 31
Avg duration trades       | 1.03
Total profit              | -29.0
Avg profit per trade      | -1.029
--
Total profit test:        > -221.1
Avg profit per trade test > -5.586
-------------------------------------
-------------------------------------
steps                     | 62030
episodes                  | 29
% time spent exploring    | 2
--
mean episode reward       | -427.2
Total operations          | 30
Avg duration trades       | 1.03
Total profit              | 22.0
Avg profit per trade      | 0.417
--
Total profit test:        > -81.5
Avg profit per trade test > -1.489
-------------------------------------
-------------------------------------
steps                     | 64169
episodes                  | 30
% time spent exploring    | 2
--
mean episode reward       | -411.4
Total operations          | 28
Avg duration trades       | 1.07
Total profit              | 91.0
Avg profit per trade      | 0.029
--
Total profit test:        > -14.9
Avg profit per trade test > -0.949
-------------------------------------
-------------------------------------
steps                     | 66308
episodes                  | 31
% time spent exploring    | 2
--
mean episode reward       | -397.3
Total operations          | 32
Avg duration trades       | 1.06
Total profit              | -250.0
Avg profit per trade      | -5.978
--
Total profit test:        > -109.1
Avg profit per trade test > -3.825
-------------------------------------
-------------------------------------
steps                     | 68447
episodes                  | 32
% time spent exploring    | 2
--
mean episode reward       | -390.6
Total operations          | 36
Avg duration trades       | 1.06
Total profit              | -14.0
Avg profit per trade      | -2.475
--
Total profit test:        > -174.8
Avg profit per trade test > -2.577
-------------------------------------
-------------------------------------
steps                     | 70586
episodes                  | 33
% time spent exploring    | 2
--
mean episode reward       | -380.6
Total operations          | 32
Avg duration trades       | 1.19
Total profit              | 48.0
Avg profit per trade      | -0.463
--
Total profit test:        > -139.3
Avg profit per trade test > -1.747
-------------------------------------
-------------------------------------
steps                     | 72725
episodes                  | 34
% time spent exploring    | 2
--
mean episode reward       | -368.9
Total operations          | 28
Avg duration trades       | 1.11
Total profit              | -1.0
Avg profit per trade      | -3.196
--
Total profit test:        > -153.0
Avg profit per trade test > -1.558
-------------------------------------
-------------------------------------
steps                     | 74864
episodes                  | 35
% time spent exploring    | 2
--
mean episode reward       | -360.2
Total operations          | 27
Avg duration trades       | 1.11
Total profit              | 26.0
Avg profit per trade      | 0.241
--
Total profit test:        > -172.7
Avg profit per trade test > -2.689
-------------------------------------
-------------------------------------
steps                     | 77003
episodes                  | 36
% time spent exploring    | 2
--
mean episode reward       | -348.7
Total operations          | 30
Avg duration trades       | 1.07
Total profit              | 23.0
Avg profit per trade      | -0.703
--
Total profit test:        > -122.0
Avg profit per trade test > -2.392
-------------------------------------
-------------------------------------
steps                     | 79142
episodes                  | 37
% time spent exploring    | 2
--
mean episode reward       | -338.6
Total operations          | 34
Avg duration trades       | 1.06
Total profit              | 11.0
Avg profit per trade      | 2.085
--
Total profit test:        > -135.7
Avg profit per trade test > -6.794
-------------------------------------
-------------------------------------
steps                     | 81281
episodes                  | 38
% time spent exploring    | 2
--
mean episode reward       | -326.8
Total operations          | 32
Avg duration trades       | 1.06
Total profit              | -72.0
Avg profit per trade      | -2.744
--
Total profit test:        > -122.9
Avg profit per trade test > -2.819
-------------------------------------
-------------------------------------
steps                     | 83420
episodes                  | 39
% time spent exploring    | 2
--
mean episode reward       | -319.6
Total operations          | 21
Avg duration trades       | 1.1
Total profit              | -9.0
Avg profit per trade      | -2.038
--
Total profit test:        > -122.6
Avg profit per trade test > -1.194
-------------------------------------
-------------------------------------
steps                     | 85559
episodes                  | 40
% time spent exploring    | 2
--
mean episode reward       | -311.7
Total operations          | 29
Avg duration trades       | 1.21
Total profit              | -62.0
Avg profit per trade      | -0.838
--
Total profit test:        > -145.5
Avg profit per trade test > -1.73
-------------------------------------
-------------------------------------
steps                     | 87698
episodes                  | 41
% time spent exploring    | 2
--
mean episode reward       | -303.0
Total operations          | 33
Avg duration trades       | 1.18
Total profit              | 32.0
Avg profit per trade      | -0.9
--
Total profit test:        > -173.6
Avg profit per trade test > -2.388
-------------------------------------
-------------------------------------
steps                     | 89837
episodes                  | 42
% time spent exploring    | 2
--
mean episode reward       | -294.7
Total operations          | 27
Avg duration trades       | 1.33
Total profit              | 142.0
Avg profit per trade      | 2.481
--
Total profit test:        > -157.2
Avg profit per trade test > -2.287
-------------------------------------
-------------------------------------
steps                     | 91976
episodes                  | 43
% time spent exploring    | 2
--
mean episode reward       | -283.4
Total operations          | 19
Avg duration trades       | 1.47
Total profit              | 24.0
Avg profit per trade      | 0.3
--
Total profit test:        > -212.0
Avg profit per trade test > -2.696
-------------------------------------
-------------------------------------
steps                     | 94115
episodes                  | 44
% time spent exploring    | 2
--
mean episode reward       | -274.5
Total operations          | 27
Avg duration trades       | 1.7
Total profit              | 135.0
Avg profit per trade      | 2.086
--
Total profit test:        > -134.9
Avg profit per trade test > -2.287
-------------------------------------
-------------------------------------
steps                     | 96254
episodes                  | 45
% time spent exploring    | 2
--
mean episode reward       | -263.6
Total operations          | 39
Avg duration trades       | 2.36
Total profit              | 105.0
Avg profit per trade      | 1.702
--
Total profit test:        > -221.5
Avg profit per trade test > -2.88
-------------------------------------
-------------------------------------
steps                     | 98393
episodes                  | 46
% time spent exploring    | 2
--
mean episode reward       | -264.1
Total operations          | 51
Avg duration trades       | 2.16
Total profit              | 127.0
Avg profit per trade      | 1.357
--
Total profit test:        > -124.3
Avg profit per trade test > -2.08
-------------------------------------
-------------------------------------
steps                     | 100532
episodes                  | 47
% time spent exploring    | 2
--
mean episode reward       | -256.6
Total operations          | 53
Avg duration trades       | 2.53
Total profit              | 132.0
Avg profit per trade      | 2.434
--
Total profit test:        > -175.4
Avg profit per trade test > -4.422
-------------------------------------
-------------------------------------
steps                     | 102671
episodes                  | 48
% time spent exploring    | 2
--
mean episode reward       | -232.8
Total operations          | 67
Avg duration trades       | 4.81
Total profit              | 53.0
Avg profit per trade      | -0.725
--
Total profit test:        > -371.3
Avg profit per trade test > -4.285
-------------------------------------
-------------------------------------
steps                     | 104810
episodes                  | 49
% time spent exploring    | 2
--
mean episode reward       | -253.5
Total operations          | 50
Avg duration trades       | 4.76
Total profit              | 171.0
Avg profit per trade      | 2.255
--
Total profit test:        > -358.2
Avg profit per trade test > -7.422
-------------------------------------
-------------------------------------
steps                     | 106949
episodes                  | 50
% time spent exploring    | 2
--
mean episode reward       | -200.4
Total operations          | 78
Avg duration trades       | 6.01
Total profit              | -184.0
Avg profit per trade      | -2.765
--
Total profit test:        > 141.1
Avg profit per trade test > 1.08
-------------------------------------
-------------------------------------
steps                     | 109088
episodes                  | 51
% time spent exploring    | 2
--
mean episode reward       | -261.9
Total operations          | 74
Avg duration trades       | 5.3
Total profit              | -209.0
Avg profit per trade      | -2.991
--
Total profit test:        > 299.2
Avg profit per trade test > 7.845
-------------------------------------
-------------------------------------
steps                     | 111227
episodes                  | 52
% time spent exploring    | 2
--
mean episode reward       | -293.0
Total operations          | 96
Avg duration trades       | 4.18
Total profit              | -474.0
Avg profit per trade      | -2.953
--
Total profit test:        > -222.5
Avg profit per trade test > -3.047
-------------------------------------
-------------------------------------
steps                     | 113366
episodes                  | 53
% time spent exploring    | 2
--
mean episode reward       | -310.3
Total operations          | 91
Avg duration trades       | 3.82
Total profit              | -401.0
Avg profit per trade      | -1.372
--
Total profit test:        > -424.0
Avg profit per trade test > -5.69
-------------------------------------
-------------------------------------
steps                     | 115505
episodes                  | 54
% time spent exploring    | 2
--
mean episode reward       | -303.5
Total operations          | 51
Avg duration trades       | 2.84
Total profit              | 34.0
Avg profit per trade      | -0.114
--
Total profit test:        > -273.3
Avg profit per trade test > -3.617
-------------------------------------
-------------------------------------
steps                     | 117644
episodes                  | 55
% time spent exploring    | 2
--
mean episode reward       | -293.7
Total operations          | 64
Avg duration trades       | 3.73
Total profit              | -248.0
Avg profit per trade      | -1.477
--
Total profit test:        > -203.2
Avg profit per trade test > -4.718
-------------------------------------
-------------------------------------
steps                     | 119783
episodes                  | 56
% time spent exploring    | 2
--
mean episode reward       | -282.2
Total operations          | 61
Avg duration trades       | 4.02
Total profit              | 45.0
Avg profit per trade      | -0.679
--
Total profit test:        > -367.9
Avg profit per trade test > -4.455
-------------------------------------
-------------------------------------
steps                     | 121922
episodes                  | 57
% time spent exploring    | 2
--
mean episode reward       | -258.2
Total operations          | 58
Avg duration trades       | 3.81
Total profit              | -229.0
Avg profit per trade      | -2.321
--
Total profit test:        > -365.9
Avg profit per trade test > -3.31
-------------------------------------
-------------------------------------
steps                     | 124061
episodes                  | 58
% time spent exploring    | 2
--
mean episode reward       | -234.9
Total operations          | 60
Avg duration trades       | 4.17
Total profit              | -481.0
Avg profit per trade      | -6.095
--
Total profit test:        > -67.2
Avg profit per trade test > -2.657
-------------------------------------
-------------------------------------
steps                     | 126200
episodes                  | 59
% time spent exploring    | 2
--
mean episode reward       | -241.9
Total operations          | 91
Avg duration trades       | 3.89
Total profit              | -559.0
Avg profit per trade      | -2.791
--
Total profit test:        > -524.3
Avg profit per trade test > -4.596
-------------------------------------
-------------------------------------
steps                     | 128339
episodes                  | 60
% time spent exploring    | 2
--
mean episode reward       | -247.4
Total operations          | 35
Avg duration trades       | 4.37
Total profit              | 76.0
Avg profit per trade      | 2.12
--
Total profit test:        > -328.1
Avg profit per trade test > -3.563
-------------------------------------
-------------------------------------
steps                     | 130478
episodes                  | 61
% time spent exploring    | 2
--
mean episode reward       | -229.6
Total operations          | 68
Avg duration trades       | 5.29
Total profit              | -471.0
Avg profit per trade      | -4.554
--
Total profit test:        > -294.4
Avg profit per trade test > -3.465
-------------------------------------
-------------------------------------
steps                     | 132617
episodes                  | 62
% time spent exploring    | 2
--
mean episode reward       | -220.3
Total operations          | 56
Avg duration trades       | 5.16
Total profit              | 114.0
Avg profit per trade      | 2.179
--
Total profit test:        > -372.8
Avg profit per trade test > -4.898
-------------------------------------
-------------------------------------
steps                     | 134756
episodes                  | 63
% time spent exploring    | 2
--
mean episode reward       | -178.8
Total operations          | 88
Avg duration trades       | 5.34
Total profit              | 30.0
Avg profit per trade      | -0.115
--
Total profit test:        > -114.8
Avg profit per trade test > -2.296
-------------------------------------
-------------------------------------
steps                     | 136895
episodes                  | 64
% time spent exploring    | 2
--
mean episode reward       | -176.3
Total operations          | 129
Avg duration trades       | 5.79
Total profit              | -895.0
Avg profit per trade      | -6.125
--
Total profit test:        > -396.8
Avg profit per trade test > -3.712
-------------------------------------
-------------------------------------
steps                     | 139034
episodes                  | 65
% time spent exploring    | 2
--
mean episode reward       | -220.5
Total operations          | 142
Avg duration trades       | 4.72
Total profit              | -976.0
Avg profit per trade      | -4.858
--
Total profit test:        > -512.4
Avg profit per trade test > -7.022
-------------------------------------
-------------------------------------
steps                     | 141173
episodes                  | 66
% time spent exploring    | 2
--
mean episode reward       | -271.9
Total operations          | 66
Avg duration trades       | 5.26
Total profit              | -277.0
Avg profit per trade      | -2.55
--
Total profit test:        > -554.5
Avg profit per trade test > -6.777
-------------------------------------
-------------------------------------
steps                     | 143312
episodes                  | 67
% time spent exploring    | 2
--
mean episode reward       | -286.9
Total operations          | 50
Avg duration trades       | 5.18
Total profit              | 92.0
Avg profit per trade      | 2.398
--
Total profit test:        > -352.5
Avg profit per trade test > -3.916
-------------------------------------
-------------------------------------
steps                     | 145451
episodes                  | 68
% time spent exploring    | 2
--
mean episode reward       | -258.0
Total operations          | 52
Avg duration trades       | 5.63
Total profit              | -100.0
Avg profit per trade      | -2.238
--
Total profit test:        > -414.7
Avg profit per trade test > -5.205
-------------------------------------
-------------------------------------
steps                     | 147590
episodes                  | 69
% time spent exploring    | 2
--
mean episode reward       | -266.3
Total operations          | 63
Avg duration trades       | 3.21
Total profit              | -30.0
Avg profit per trade      | -0.91
--
Total profit test:        > -406.4
Avg profit per trade test > -2.836
-------------------------------------
-------------------------------------
steps                     | 149729
episodes                  | 70
% time spent exploring    | 2
--
mean episode reward       | -256.6
Total operations          | 50
Avg duration trades       | 5.96
Total profit              | 34.0
Avg profit per trade      | 0.657
--
Total profit test:        > -519.4
Avg profit per trade test > -5.599
-------------------------------------
-------------------------------------
steps                     | 151868
episodes                  | 71
% time spent exploring    | 2
--
mean episode reward       | -219.0
Total operations          | 80
Avg duration trades       | 5.54
Total profit              | -7.0
Avg profit per trade      | 2.668
--
Total profit test:        > -289.8
Avg profit per trade test > -3.255
-------------------------------------
-------------------------------------
steps                     | 154007
episodes                  | 72
% time spent exploring    | 2
--
mean episode reward       | -204.6
Total operations          | 75
Avg duration trades       | 6.12
Total profit              | -411.0
Avg profit per trade      | -2.493
--
Total profit test:        > -506.0
Avg profit per trade test > -5.549
-------------------------------------
-------------------------------------
steps                     | 156146
episodes                  | 73
% time spent exploring    | 2
--
mean episode reward       | -212.1
Total operations          | 65
Avg duration trades       | 4.82
Total profit              | -141.0
Avg profit per trade      | -1.011
--
Total profit test:        > -354.4
Avg profit per trade test > -2.282
-------------------------------------
-------------------------------------
steps                     | 158285
episodes                  | 74
% time spent exploring    | 2
--
mean episode reward       | -181.1
Total operations          | 60
Avg duration trades       | 4.85
Total profit              | 170.0
Avg profit per trade      | 1.239
--
Total profit test:        > -225.0
Avg profit per trade test > -1.956
-------------------------------------
-------------------------------------
steps                     | 160424
episodes                  | 75
% time spent exploring    | 2
--
mean episode reward       | -146.1
Total operations          | 60
Avg duration trades       | 5.6
Total profit              | -279.0
Avg profit per trade      | -1.56
--
Total profit test:        > -463.5
Avg profit per trade test > -6.091
-------------------------------------
-------------------------------------
steps                     | 162563
episodes                  | 76
% time spent exploring    | 2
--
mean episode reward       | -156.0
Total operations          | 68
Avg duration trades       | 7.79
Total profit              | -278.0
Avg profit per trade      | -2.712
--
Total profit test:        > -432.1
Avg profit per trade test > -4.569
-------------------------------------
-------------------------------------
steps                     | 164702
episodes                  | 77
% time spent exploring    | 2
--
mean episode reward       | -167.8
Total operations          | 73
Avg duration trades       | 6.4
Total profit              | -585.0
Avg profit per trade      | -7.718
--
Total profit test:        > -354.5
Avg profit per trade test > -4.427
-------------------------------------
-------------------------------------
steps                     | 166841
episodes                  | 78
% time spent exploring    | 2
--
mean episode reward       | -183.0
Total operations          | 56
Avg duration trades       | 4.71
Total profit              | -92.0
Avg profit per trade      | -1.081
--
Total profit test:        > -397.4
Avg profit per trade test > -6.212
-------------------------------------
-------------------------------------
steps                     | 168980
episodes                  | 79
% time spent exploring    | 2
--
mean episode reward       | -152.8
Total operations          | 38
Avg duration trades       | 13.32
Total profit              | 170.0
Avg profit per trade      | 5.454
--
Total profit test:        > -204.6
Avg profit per trade test > -9.535
-------------------------------------
-------------------------------------
steps                     | 171119
episodes                  | 80
% time spent exploring    | 2
--
mean episode reward       | -96.2
Total operations          | 56
Avg duration trades       | 8.29
Total profit              | -33.0
Avg profit per trade      | -1.126
--
Total profit test:        > -145.2
Avg profit per trade test > -2.958
-------------------------------------
-------------------------------------
steps                     | 173258
episodes                  | 81
% time spent exploring    | 2
--
mean episode reward       | -118.4
Total operations          | 66
Avg duration trades       | 7.18
Total profit              | -228.0
Avg profit per trade      | -1.469
--
Total profit test:        > -611.2
Avg profit per trade test > -10.065
-------------------------------------
-------------------------------------
steps                     | 175397
episodes                  | 82
% time spent exploring    | 2
--
mean episode reward       | -104.4
Total operations          | 54
Avg duration trades       | 8.48
Total profit              | -159.0
Avg profit per trade      | 1.967
--
Total profit test:        > -282.9
Avg profit per trade test > -6.31
-------------------------------------
-------------------------------------
steps                     | 177536
episodes                  | 83
% time spent exploring    | 2
--
mean episode reward       | -142.8
Total operations          | 60
Avg duration trades       | 7.53
Total profit              | -157.0
Avg profit per trade      | -0.392
--
Total profit test:        > -361.0
Avg profit per trade test > -8.871
-------------------------------------
-------------------------------------
steps                     | 179675
episodes                  | 84
% time spent exploring    | 2
--
mean episode reward       | -99.9
Total operations          | 77
Avg duration trades       | 5.45
Total profit              | -487.0
Avg profit per trade      | -2.562
--
Total profit test:        > -245.2
Avg profit per trade test > -2.668
-------------------------------------
-------------------------------------
steps                     | 181814
episodes                  | 85
% time spent exploring    | 2
--
mean episode reward       | -92.5
Total operations          | 77
Avg duration trades       | 4.95
Total profit              | -156.0
Avg profit per trade      | -4.25
--
Total profit test:        > -476.2
Avg profit per trade test > -3.188
-------------------------------------
-------------------------------------
steps                     | 183953
episodes                  | 86
% time spent exploring    | 2
--
mean episode reward       | -150.9
Total operations          | 64
Avg duration trades       | 5.09
Total profit              | 111.0
Avg profit per trade      | -1.071
--
Total profit test:        > -493.7
Avg profit per trade test > -4.503
-------------------------------------
-------------------------------------
steps                     | 186092
episodes                  | 87
% time spent exploring    | 2
--
mean episode reward       | -94.2
Total operations          | 62
Avg duration trades       | 5.32
Total profit              | 27.0
Avg profit per trade      | -1.087
--
Total profit test:        > -505.7
Avg profit per trade test > -4.711
-------------------------------------
-------------------------------------
steps                     | 188231
episodes                  | 88
% time spent exploring    | 2
--
mean episode reward       | -109.1
Total operations          | 68
Avg duration trades       | 5.43
Total profit              | 204.0
Avg profit per trade      | -1.159
--
Total profit test:        > -631.5
Avg profit per trade test > -7.235
-------------------------------------
-------------------------------------
steps                     | 190370
episodes                  | 89
% time spent exploring    | 2
--
mean episode reward       | -114.1
Total operations          | 72
Avg duration trades       | 5.89
Total profit              | -377.0
Avg profit per trade      | -3.36
--
Total profit test:        > -323.7
Avg profit per trade test > -5.596
-------------------------------------
-------------------------------------
steps                     | 192509
episodes                  | 90
% time spent exploring    | 2
--
mean episode reward       | -146.4
Total operations          | 83
Avg duration trades       | 4.89
Total profit              | 58.0
Avg profit per trade      | -0.707
--
Total profit test:        > -499.6
Avg profit per trade test > -4.705
-------------------------------------
-------------------------------------
steps                     | 194648
episodes                  | 91
% time spent exploring    | 2
--
mean episode reward       | -121.9
Total operations          | 73
Avg duration trades       | 6.47
Total profit              | 176.0
Avg profit per trade      | 1.499
--
Total profit test:        > -574.2
Avg profit per trade test > -5.747
-------------------------------------
-------------------------------------
steps                     | 196787
episodes                  | 92
% time spent exploring    | 2
--
mean episode reward       | -93.4
Total operations          | 65
Avg duration trades       | 6.28
Total profit              | 494.0
Avg profit per trade      | 5.395
--
Total profit test:        > -615.5
Avg profit per trade test > -6.767
-------------------------------------
-------------------------------------
steps                     | 198926
episodes                  | 93
% time spent exploring    | 2
--
mean episode reward       | 8.8
Total operations          | 69
Avg duration trades       | 3.91
Total profit              | 147.0
Avg profit per trade      | -0.51
--
Total profit test:        > -411.7
Avg profit per trade test > -4.868
-------------------------------------
-------------------------------------
steps                     | 201065
episodes                  | 94
% time spent exploring    | 2
--
mean episode reward       | 13.2
Total operations          | 68
Avg duration trades       | 4.18
Total profit              | -183.0
Avg profit per trade      | -3.897
--
Total profit test:        > -307.2
Avg profit per trade test > -3.095
-------------------------------------
-------------------------------------
steps                     | 203204
episodes                  | 95
% time spent exploring    | 2
--
mean episode reward       | -18.2
Total operations          | 70
Avg duration trades       | 3.19
Total profit              | -136.0
Avg profit per trade      | -3.918
--
Total profit test:        > -502.9
Avg profit per trade test > -4.275
-------------------------------------
-------------------------------------
steps                     | 205343
episodes                  | 96
% time spent exploring    | 2
--
mean episode reward       | -8.7
Total operations          | 50
Avg duration trades       | 4.4
Total profit              | 113.0
Avg profit per trade      | 2.527
--
Total profit test:        > -523.1
Avg profit per trade test > -5.099
-------------------------------------
-------------------------------------
steps                     | 207482
episodes                  | 97
% time spent exploring    | 2
--
mean episode reward       | -16.3
Total operations          | 68
Avg duration trades       | 4.57
Total profit              | -88.0
Avg profit per trade      | -1.681
--
Total profit test:        > -506.0
Avg profit per trade test > -7.191
-------------------------------------
-------------------------------------
steps                     | 209621
episodes                  | 98
% time spent exploring    | 2
--
mean episode reward       | -16.4
Total operations          | 88
Avg duration trades       | 4.59
Total profit              | -145.0
Avg profit per trade      | -1.27
--
Total profit test:        > -281.2
Avg profit per trade test > -6.603
-------------------------------------
-------------------------------------
steps                     | 211760
episodes                  | 99
% time spent exploring    | 2
--
mean episode reward       | -43.4
Total operations          | 86
Avg duration trades       | 5.03
Total profit              | -219.0
Avg profit per trade      | -2.174
--
Total profit test:        > 26.1
Avg profit per trade test > -0.3
-------------------------------------
-------------------------------------
steps                     | 213899
episodes                  | 100
% time spent exploring    | 2
--
mean episode reward       | -65.7
Total operations          | 128
Avg duration trades       | 4.28
Total profit              | -363.0
Avg profit per trade      | -3.468
--
Total profit test:        > -459.2
Avg profit per trade test > -11.926
-------------------------------------
-------------------------------------
steps                     | 216038
episodes                  | 101
% time spent exploring    | 2
--
mean episode reward       | -105.6
Total operations          | 74
Avg duration trades       | 5.36
Total profit              | -57.0
Avg profit per trade      | 0.227
--
Total profit test:        > -238.1
Avg profit per trade test > -4.857
-------------------------------------
-------------------------------------
steps                     | 218177
episodes                  | 102
% time spent exploring    | 2
--
mean episode reward       | -71.5
Total operations          | 58
Avg duration trades       | 5.26
Total profit              | 62.0
Avg profit per trade      | -0.003
--
Total profit test:        > -249.0
Avg profit per trade test > -4.298
-------------------------------------
-------------------------------------
steps                     | 220316
episodes                  | 103
% time spent exploring    | 2
--
mean episode reward       | -61.2
Total operations          | 41
Avg duration trades       | 5.78
Total profit              | -286.0
Avg profit per trade      | -1.055
--
Total profit test:        > -505.5
Avg profit per trade test > -9.154
-------------------------------------
-------------------------------------
steps                     | 222455
episodes                  | 104
% time spent exploring    | 2
--
mean episode reward       | -23.8
Total operations          | 44
Avg duration trades       | 5.09
Total profit              | 24.0
Avg profit per trade      | -0.862
--
Total profit test:        > -495.1
Avg profit per trade test > -5.536
-------------------------------------
-------------------------------------
steps                     | 224594
episodes                  | 105
% time spent exploring    | 2
--
mean episode reward       | -6.8
Total operations          | 43
Avg duration trades       | 4.79
Total profit              | -88.0
Avg profit per trade      | -3.695
--
Total profit test:        > -75.2
Avg profit per trade test > -3.815
-------------------------------------
-------------------------------------
steps                     | 226733
episodes                  | 106
% time spent exploring    | 2
--
mean episode reward       | -32.1
Total operations          | 59
Avg duration trades       | 1.86
Total profit              | -92.0
Avg profit per trade      | -1.058
--
Total profit test:        > -426.2
Avg profit per trade test > -5.631
-------------------------------------
-------------------------------------
steps                     | 228872
episodes                  | 107
% time spent exploring    | 2
--
mean episode reward       | -28.8
Total operations          | 45
Avg duration trades       | 1.93
Total profit              | 66.0
Avg profit per trade      | -2.98
--
Total profit test:        > -526.1
Avg profit per trade test > -6.055
-------------------------------------
-------------------------------------
steps                     | 231011
episodes                  | 108
% time spent exploring    | 2
--
mean episode reward       | -27.6
Total operations          | 63
Avg duration trades       | 1.65
Total profit              | 111.0
Avg profit per trade      | 0.381
--
Total profit test:        > -486.5
Avg profit per trade test > -4.157
-------------------------------------
-------------------------------------
steps                     | 233150
episodes                  | 109
% time spent exploring    | 2
--
mean episode reward       | -22.3
Total operations          | 61
Avg duration trades       | 2.03
Total profit              | 234.0
Avg profit per trade      | 0.556
--
Total profit test:        > -341.6
Avg profit per trade test > -3.423
-------------------------------------
-------------------------------------
steps                     | 235289
episodes                  | 110
% time spent exploring    | 2
--
mean episode reward       | -12.9
Total operations          | 71
Avg duration trades       | 2.27
Total profit              | -70.0
Avg profit per trade      | -1.342
--
Total profit test:        > -501.7
Avg profit per trade test > -4.464
-------------------------------------
-------------------------------------
steps                     | 237428
episodes                  | 111
% time spent exploring    | 2
--
mean episode reward       | -35.1
Total operations          | 59
Avg duration trades       | 2.03
Total profit              | 321.0
Avg profit per trade      | 1.241
--
Total profit test:        > -490.6
Avg profit per trade test > -4.865
-------------------------------------
-------------------------------------
steps                     | 239567
episodes                  | 112
% time spent exploring    | 2
--
mean episode reward       | -31.6
Total operations          | 59
Avg duration trades       | 1.86
Total profit              | 123.0
Avg profit per trade      | -0.507
--
Total profit test:        > -377.9
Avg profit per trade test > -4.192
-------------------------------------
-------------------------------------
steps                     | 241706
episodes                  | 113
% time spent exploring    | 2
--
mean episode reward       | -29.4
Total operations          | 51
Avg duration trades       | 1.47
Total profit              | 51.0
Avg profit per trade      | 0.443
--
Total profit test:        > -407.6
Avg profit per trade test > -4.718
-------------------------------------
-------------------------------------
steps                     | 243845
episodes                  | 114
% time spent exploring    | 2
--
mean episode reward       | -28.0
Total operations          | 50
Avg duration trades       | 1.88
Total profit              | 127.0
Avg profit per trade      | -0.14
--
Total profit test:        > -455.9
Avg profit per trade test > -4.828
-------------------------------------
-------------------------------------
steps                     | 245984
episodes                  | 115
% time spent exploring    | 2
--
mean episode reward       | -24.1
Total operations          | 63
Avg duration trades       | 1.75
Total profit              | 111.0
Avg profit per trade      | 0.305
--
Total profit test:        > -509.0
Avg profit per trade test > -4.954
-------------------------------------
-------------------------------------
steps                     | 248123
episodes                  | 116
% time spent exploring    | 2
--
mean episode reward       | -18.5
Total operations          | 68
Avg duration trades       | 1.84
Total profit              | -5.0
Avg profit per trade      | -0.29
--
Total profit test:        > -388.5
Avg profit per trade test > -4.913
-------------------------------------
-------------------------------------
steps                     | 250262
episodes                  | 117
% time spent exploring    | 2
--
mean episode reward       | -15.0
Total operations          | 53
Avg duration trades       | 2.04
Total profit              | 234.0
Avg profit per trade      | 0.96
--
Total profit test:        > -397.4
Avg profit per trade test > -4.829
-------------------------------------
-------------------------------------
steps                     | 252401
episodes                  | 118
% time spent exploring    | 2
--
mean episode reward       | -7.4
Total operations          | 92
Avg duration trades       | 1.64
Total profit              | 10.0
Avg profit per trade      | -0.49
--
Total profit test:        > -297.7
Avg profit per trade test > -3.096
-------------------------------------
-------------------------------------
steps                     | 254540
episodes                  | 119
% time spent exploring    | 2
--
mean episode reward       | -4.4
Total operations          | 61
Avg duration trades       | 2.33
Total profit              | 95.0
Avg profit per trade      | -0.405
--
Total profit test:        > -311.8
Avg profit per trade test > -3.635
-------------------------------------
-------------------------------------
steps                     | 256679
episodes                  | 120
% time spent exploring    | 2
--
mean episode reward       | -1.9
Total operations          | 54
Avg duration trades       | 2.02
Total profit              | 29.0
Avg profit per trade      | 0.624
--
Total profit test:        > -533.3
Avg profit per trade test > -4.602
-------------------------------------
-------------------------------------
steps                     | 258818
episodes                  | 121
% time spent exploring    | 2
--
mean episode reward       | 0.8
Total operations          | 57
Avg duration trades       | 2.11
Total profit              | 129.0
Avg profit per trade      | 0.526
--
Total profit test:        > -484.1
Avg profit per trade test > -4.915
-------------------------------------
-------------------------------------
steps                     | 260957
episodes                  | 122
% time spent exploring    | 2
--
mean episode reward       | 6.0
Total operations          | 46
Avg duration trades       | 3.7
Total profit              | 84.0
Avg profit per trade      | -0.413
--
Total profit test:        > -552.4
Avg profit per trade test > -5.417
-------------------------------------
-------------------------------------
steps                     | 263096
episodes                  | 123
% time spent exploring    | 2
--
mean episode reward       | 11.1
Total operations          | 45
Avg duration trades       | 5.47
Total profit              | 4.0
Avg profit per trade      | 1.591
--
Total profit test:        > -184.0
Avg profit per trade test > -2.802
-------------------------------------
-------------------------------------
steps                     | 265235
episodes                  | 124
% time spent exploring    | 2
--
mean episode reward       | 21.3
Total operations          | 95
Avg duration trades       | 3.56
Total profit              | 99.0
Avg profit per trade      | 2.021
--
Total profit test:        > -210.6
Avg profit per trade test > -5.125
-------------------------------------
-------------------------------------
steps                     | 267374
episodes                  | 125
% time spent exploring    | 2
--
mean episode reward       | 59.2
Total operations          | 56
Avg duration trades       | 7.07
Total profit              | -25.0
Avg profit per trade      | 1.204
--
Total profit test:        > -181.1
Avg profit per trade test > -5.492
-------------------------------------
-------------------------------------
steps                     | 269513
episodes                  | 126
% time spent exploring    | 2
--
mean episode reward       | 72.8
Total operations          | 105
Avg duration trades       | 7.2
Total profit              | -71.0
Avg profit per trade      | 0.523
--
Total profit test:        > -308.8
Avg profit per trade test > -13.9
-------------------------------------
-------------------------------------
steps                     | 271652
episodes                  | 127
% time spent exploring    | 2
--
mean episode reward       | 128.3
Total operations          | 153
Avg duration trades       | 7.65
Total profit              | -313.0
Avg profit per trade      | -0.681
--
Total profit test:        > -132.7
Avg profit per trade test > -4.605
-------------------------------------
-------------------------------------
steps                     | 273791
episodes                  | 128
% time spent exploring    | 2
--
mean episode reward       | 134.5
Total operations          | 84
Avg duration trades       | 12.81
Total profit              | -38.0
Avg profit per trade      | 0.785
--
Total profit test:        > -279.7
Avg profit per trade test > -9.3
-------------------------------------
-------------------------------------
steps                     | 275930
episodes                  | 129
% time spent exploring    | 2
--
mean episode reward       | 191.9
Total operations          | 93
Avg duration trades       | 11.56
Total profit              | 96.0
Avg profit per trade      | -1.676
--
Total profit test:        > -207.7
Avg profit per trade test > -8.28
-------------------------------------
-------------------------------------
steps                     | 278069
episodes                  | 130
% time spent exploring    | 2
--
mean episode reward       | 212.9
Total operations          | 90
Avg duration trades       | 15.38
Total profit              | -115.0
Avg profit per trade      | 0.568
--
Total profit test:        > -177.2
Avg profit per trade test > -5.861
-------------------------------------
-------------------------------------
steps                     | 280208
episodes                  | 131
% time spent exploring    | 2
--
mean episode reward       | 197.6
Total operations          | 114
Avg duration trades       | 14.01
Total profit              | -202.0
Avg profit per trade      | -0.016
--
Total profit test:        > -68.7
Avg profit per trade test > -2.85
-------------------------------------
-------------------------------------
steps                     | 282347
episodes                  | 132
% time spent exploring    | 2
--
mean episode reward       | 277.4
Total operations          | 111
Avg duration trades       | 15.96
Total profit              | 237.0
Avg profit per trade      | 2.396
--
Total profit test:        > -207.3
Avg profit per trade test > -9.785
-------------------------------------
-------------------------------------
steps                     | 284486
episodes                  | 133
% time spent exploring    | 2
--
mean episode reward       | 266.1
Total operations          | 96
Avg duration trades       | 19.39
Total profit              | -218.0
Avg profit per trade      | -0.936
--
Total profit test:        > -237.8
Avg profit per trade test > -9.595
-------------------------------------
-------------------------------------
steps                     | 286625
episodes                  | 134
% time spent exploring    | 2
--
mean episode reward       | 232.5
Total operations          | 72
Avg duration trades       | 26.15
Total profit              | -124.0
Avg profit per trade      | 0.777
--
Total profit test:        > -247.2
Avg profit per trade test > -9.252
-------------------------------------
-------------------------------------
steps                     | 288764
episodes                  | 135
% time spent exploring    | 2
--
mean episode reward       | 118.4
Total operations          | 81
Avg duration trades       | 22.62
Total profit              | -388.0
Avg profit per trade      | -2.056
--
Total profit test:        > -263.7
Avg profit per trade test > -9.942
-------------------------------------
-------------------------------------
steps                     | 290903
episodes                  | 136
% time spent exploring    | 2
--
mean episode reward       | 96.8
Total operations          | 61
Avg duration trades       | 32.85
Total profit              | -570.0
Avg profit per trade      | -7.089
--
Total profit test:        > -249.3
Avg profit per trade test > -12.182
-------------------------------------
-------------------------------------
steps                     | 293042
episodes                  | 137
% time spent exploring    | 2
--
mean episode reward       | 21.4
Total operations          | 55
Avg duration trades       | 36.8
Total profit              | 491.0
Avg profit per trade      | 9.164
--
Total profit test:        > -261.1
Avg profit per trade test > -11.267
-------------------------------------
-------------------------------------
steps                     | 295181
episodes                  | 138
% time spent exploring    | 2
--
mean episode reward       | 168.8
Total operations          | 59
Avg duration trades       | 34.03
Total profit              | -98.0
Avg profit per trade      | -0.388
--
Total profit test:        > -250.0
Avg profit per trade test > -13.862
-------------------------------------
-------------------------------------
steps                     | 297320
episodes                  | 139
% time spent exploring    | 2
--
mean episode reward       | 130.1
Total operations          | 74
Avg duration trades       | 27.39
Total profit              | 112.0
Avg profit per trade      | 2.003
--
Total profit test:        > -249.9
Avg profit per trade test > -9.914
-------------------------------------
-------------------------------------
steps                     | 299459
episodes                  | 140
% time spent exploring    | 2
--
mean episode reward       | 197.2
Total operations          | 93
Avg duration trades       | 21.28
Total profit              | 99.0
Avg profit per trade      | 1.421
--
Total profit test:        > -260.8
Avg profit per trade test > -11.521
-------------------------------------
SUMMARY STATISTICS
Total Trades Taken:  14
Total Reward:  347.3
Average Reward per Trade:  24.8071428571
Win Ratio: 71.4285714286 %
[ { 'Entry Price': 1.3588100000000001,
    'Entry Time': Timestamp('2013-12-02 03:00:00'),
    'Exit Price': 1.35856,
    'Exit Time': Timestamp('2013-12-04 05:00:00'),
    'Profit': -5.5000000000008349,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': -2.1000000000007546},
  { 'Entry Price': 1.35856,
    'Entry Time': Timestamp('2013-12-04 06:00:00'),
    'Exit Price': 1.3654899999999999,
    'Exit Time': Timestamp('2013-12-06 08:00:00'),
    'Profit': 66.299999999998803,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': 70.600000000000364},
  { 'Entry Price': 1.3654999999999999,
    'Entry Time': Timestamp('2013-12-06 09:00:00'),
    'Exit Price': 1.37453,
    'Exit Time': Timestamp('2013-12-10 11:00:00'),
    'Profit': 87.300000000000935,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': 94.400000000000801},
  { 'Entry Price': 1.3745100000000001,
    'Entry Time': Timestamp('2013-12-10 12:00:00'),
    'Exit Price': 1.3781600000000001,
    'Exit Time': Timestamp('2013-12-12 14:00:00'),
    'Profit': 33.499999999999311,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': 38.499999999998749},
  { 'Entry Price': 1.37819,
    'Entry Time': Timestamp('2013-12-12 15:00:00'),
    'Exit Price': 1.3752899999999999,
    'Exit Time': Timestamp('2013-12-16 17:00:00'),
    'Profit': -32.000000000001251,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': -25.600000000001497},
  { 'Entry Price': 1.3752799999999998,
    'Entry Time': Timestamp('2013-12-16 18:00:00'),
    'Exit Price': 1.3746,
    'Exit Time': Timestamp('2013-12-16 19:00:00'),
    'Profit': -9.7999999999979188,
    'Trade Duration': 1,
    'Type': 'BUY',
    'reward': -0.099999999996544631},
  { 'Entry Price': 1.3745799999999999,
    'Entry Time': Timestamp('2013-12-16 20:00:00'),
    'Exit Price': 1.3693200000000001,
    'Exit Time': Timestamp('2013-12-18 22:00:00'),
    'Profit': -55.599999999998204,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': 7.4000000000015191},
  { 'Entry Price': 1.3693,
    'Entry Time': Timestamp('2013-12-18 23:00:00'),
    'Exit Price': 1.3670599999999999,
    'Exit Time': Timestamp('2013-12-19 15:00:00'),
    'Profit': 19.400000000000198,
    'Trade Duration': 16,
    'Type': 'SELL',
    'reward': 28.200000000000109},
  { 'Entry Price': 1.36707,
    'Entry Time': Timestamp('2013-12-19 16:00:00'),
    'Exit Price': 1.36646,
    'Exit Time': Timestamp('2013-12-19 18:00:00'),
    'Profit': 3.0999999999999943,
    'Trade Duration': 2,
    'Type': 'SELL',
    'reward': 0.29999999999941451},
  { 'Entry Price': 1.3659299999999999,
    'Entry Time': Timestamp('2013-12-19 21:00:00'),
    'Exit Price': 1.3672,
    'Exit Time': Timestamp('2013-12-20 18:00:00'),
    'Profit': 9.7000000000010438,
    'Trade Duration': 21,
    'Type': 'BUY',
    'reward': 8.3000000000007574},
  { 'Entry Price': 1.3670799999999999,
    'Entry Time': Timestamp('2013-12-20 21:00:00'),
    'Exit Price': 1.36842,
    'Exit Time': Timestamp('2013-12-26 09:00:00'),
    'Profit': 10.400000000001189,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': 10.700000000000934},
  { 'Entry Price': 1.3684100000000001,
    'Entry Time': Timestamp('2013-12-26 10:00:00'),
    'Exit Price': 1.3762700000000001,
    'Exit Time': Timestamp('2013-12-30 12:00:00'),
    'Profit': 75.599999999999781,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': 64.599999999998801},
  { 'Entry Price': 1.3762799999999999,
    'Entry Time': Timestamp('2013-12-30 13:00:00'),
    'Exit Price': 1.3668799999999999,
    'Exit Time': Timestamp('2014-01-02 18:00:00'),
    'Profit': 91.000000000000753,
    'Trade Duration': 50,
    'Type': 'SELL',
    'reward': 102.59999999999904},
  { 'Entry Price': 1.3668799999999999,
    'Entry Time': Timestamp('2014-01-02 19:00:00'),
    'Exit Price': 1.3611899999999999,
    'Exit Time': Timestamp('2014-01-03 17:00:00'),
    'Profit': 53.899999999999729,
    'Trade Duration': 22,
    'Type': 'SELL',
    'reward': 57.199999999999143}]
Training period  2014-04-08 20:00:00 - 2014-05-30 22:00:00
Average Reward is 3.775
SUMMARY STATISTICS
Total Trades Taken:  19
Total Reward:  33.1
Average Reward per Trade:  1.74210526316
Win Ratio: 42.1052631579 %
[ { 'Entry Price': 1.3794999999999999,
    'Entry Time': Timestamp('2014-04-08 20:00:00'),
    'Exit Price': 1.3884100000000001,
    'Exit Time': Timestamp('2014-04-10 22:00:00'),
    'Profit': 86.100000000001955,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': 91.400000000001143},
  { 'Entry Price': 1.38839,
    'Entry Time': Timestamp('2014-04-10 23:00:00'),
    'Exit Price': 1.3815600000000001,
    'Exit Time': Timestamp('2014-04-15 01:00:00'),
    'Profit': -71.299999999998917,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': -69.500000000000455},
  { 'Entry Price': 1.3815600000000001,
    'Entry Time': Timestamp('2014-04-15 02:00:00'),
    'Exit Price': 1.3835999999999999,
    'Exit Time': Timestamp('2014-04-17 04:00:00'),
    'Profit': 17.399999999998197,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': 18.599999999999397},
  { 'Entry Price': 1.3835999999999999,
    'Entry Time': Timestamp('2014-04-17 05:00:00'),
    'Exit Price': 1.3815299999999999,
    'Exit Time': Timestamp('2014-04-21 07:00:00'),
    'Profit': -23.700000000000163,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': -29.200000000000667},
  { 'Entry Price': 1.38168,
    'Entry Time': Timestamp('2014-04-21 09:00:00'),
    'Exit Price': 1.38462,
    'Exit Time': Timestamp('2014-04-23 11:00:00'),
    'Profit': 26.399999999999427,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': 0.99999999999955946},
  { 'Entry Price': 1.3847499999999999,
    'Entry Time': Timestamp('2014-04-23 12:00:00'),
    'Exit Price': 1.38154,
    'Exit Time': Timestamp('2014-04-24 00:00:00'),
    'Profit': -35.099999999999355,
    'Trade Duration': 12,
    'Type': 'BUY',
    'reward': -35.20000000000001},
  { 'Entry Price': 1.3815200000000001,
    'Entry Time': Timestamp('2014-04-24 01:00:00'),
    'Exit Price': 1.38195,
    'Exit Time': Timestamp('2014-04-28 03:00:00'),
    'Profit': 1.2999999999993044,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': 19.899999999999032},
  { 'Entry Price': 1.3819399999999999,
    'Entry Time': Timestamp('2014-04-28 04:00:00'),
    'Exit Price': 1.38052,
    'Exit Time': Timestamp('2014-04-30 06:00:00'),
    'Profit': -17.199999999999768,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': -16.799999999999347},
  { 'Entry Price': 1.3805399999999999,
    'Entry Time': Timestamp('2014-04-30 07:00:00'),
    'Exit Price': 1.3856299999999999,
    'Exit Time': Timestamp('2014-05-02 09:00:00'),
    'Profit': 47.900000000000389,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': 50.200000000002127},
  { 'Entry Price': 1.3856299999999999,
    'Entry Time': Timestamp('2014-05-02 10:00:00'),
    'Exit Price': 1.3924399999999999,
    'Exit Time': Timestamp('2014-05-06 12:00:00'),
    'Profit': 65.099999999999824,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': 65.099999999999838},
  { 'Entry Price': 1.39246,
    'Entry Time': Timestamp('2014-05-06 13:00:00'),
    'Exit Price': 1.3909,
    'Exit Time': Timestamp('2014-05-08 15:00:00'),
    'Profit': -18.600000000000058,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': 18.999999999999797},
  { 'Entry Price': 1.3862299999999999,
    'Entry Time': Timestamp('2014-05-08 18:00:00'),
    'Exit Price': 1.37575,
    'Exit Time': Timestamp('2014-05-12 20:00:00'),
    'Profit': 101.79999999999822,
    'Trade Duration': 50,
    'Type': 'SELL',
    'reward': 103.19999999999852},
  { 'Entry Price': 1.37574,
    'Entry Time': Timestamp('2014-05-12 21:00:00'),
    'Exit Price': 1.37137,
    'Exit Time': Timestamp('2014-05-14 12:00:00'),
    'Profit': -46.699999999999847,
    'Trade Duration': 39,
    'Type': 'BUY',
    'reward': -36.199999999999896},
  { 'Entry Price': 1.37137,
    'Entry Time': Timestamp('2014-05-14 13:00:00'),
    'Exit Price': 1.3704100000000001,
    'Exit Time': Timestamp('2014-05-16 15:00:00'),
    'Profit': -12.599999999998499,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': -17.799999999999265},
  { 'Entry Price': 1.3704499999999999,
    'Entry Time': Timestamp('2014-05-16 16:00:00'),
    'Exit Price': 1.36999,
    'Exit Time': Timestamp('2014-05-20 18:00:00'),
    'Profit': -7.5999999999990493,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': -12.899999999998244},
  { 'Entry Price': 1.37002,
    'Entry Time': Timestamp('2014-05-20 19:00:00'),
    'Exit Price': 1.3649799999999999,
    'Exit Time': Timestamp('2014-05-22 21:00:00'),
    'Profit': -53.400000000001555,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': -57.600000000000207},
  { 'Entry Price': 1.36496,
    'Entry Time': Timestamp('2014-05-22 22:00:00'),
    'Exit Price': 1.3638999999999999,
    'Exit Time': Timestamp('2014-05-26 15:00:00'),
    'Profit': -13.600000000000609,
    'Trade Duration': 41,
    'Type': 'BUY',
    'reward': -13.200000000000212},
  { 'Entry Price': 1.3638999999999999,
    'Entry Time': Timestamp('2014-05-26 16:00:00'),
    'Exit Price': 1.35934,
    'Exit Time': Timestamp('2014-05-28 18:00:00'),
    'Profit': -48.599999999998971,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': -47.399999999997775},
  { 'Entry Price': 1.3593600000000001,
    'Entry Time': Timestamp('2014-05-28 19:00:00'),
    'Exit Price': 1.36321,
    'Exit Time': Timestamp('2014-05-30 21:00:00'),
    'Profit': 35.499999999999091,
    'Trade Duration': 50,
    'Type': 'BUY',
    'reward': 44.999999999999162}]

In [8]:
%matplotlib inline
import matplotlib.pyplot as plt

plt.plot(l_mean_episode_reward)
plt.xlabel('Episode')
plt.ylabel('Total Reward')
plt.show()


It converges!

Time to implement your own strategy :)

And remember, be careful of overfitting.