Deep Direct Reinforcement Learning for Financial Signal Representation and Trading

Abstract

資産取引において練達したトレーダーを打ち負かすことができるほどにコンピューターを訓練することはできるのか。この論文では再帰型深層ニューラルネットワークを

このモデルにおける取引=市場の状況把握と最適行動の二つ逐次的意思決定からなる。
従来手法と比べてこの動的な意思決定モデルは熟練したトレーダーの情報がない分だけ挑戦的。
・・・状況を探索して各期の最適行動を決定する必要がある。

自己学習と強化学習(RL)。 確率的最適制御と強化学習の研究。いくつかの事例では強化学習を備えたコンピューターが人間の能力を凌駕することもある。 ・・・トレーディングにおいてもRLが人間に勝つことができるのではないか?
but トレーディングにおけるRLには従来のRLが対象にしてきた課題にはない問題がある。

①金融環境の把握と表現の困難さから生じる問題。
金融データには膨大な数のノイズ、ジャンプ、そして時系列を非定常なものに導くような運動が含まれている。
移動平均やstochastic technical indicatorsが市場状態の評価のために用いられている。
but drawback of technical analysis is its poor generalization ability.

②トレーディングのダイナミックな意思決定から生じる問題。
注文の発注は様々な要因を考慮に入れた上でのシステマテック的な仕事である。
頻繁な取引ポジションの変更は利益を生まないだけでなくTransaction costやslippageから多大なロスを生む。

この二つの問題に取り組むためにRDNN構造を、①の環境への反応問題と、②の意思決定問題に対処するために導入する。
RDNN $\approx$ DNN for feature learning + RNN for market summarization
※ To further improve the robustness for market summarization, the fuzzy learining concepts are introduced to reduce the uncertainty of the input data.

3 Direct Deep Reinforcement Learning

A Direct Reinforcement Trading

= 離散選択モデルで政策を決め動的計画法で解く。
typical DRL is essentially a one layer RNN.
$p_1, p_2, ..., p_t, ...$ : price sequences released from the exchange center.
return : $ z_t = p_t - p_{t-1} $
real-time trading decision : $ \delta_t \in \{long, neutral, short\} = \{1, 0, -1\} $
profit : $ R_t = \delta_{t-1} z_t - c|\delta_t - \delta_{t-1}| $

profit = value function in each time point
the accumulated value throughout the whole training period can ve defined as
$$ max_\Theta U_T \{R_1...R_T|\Theta\} $$ 簡単化のため、ここでは全体のvalue functionを各時点でのvalue functionの和として話を進める。

the primary problem : how to solve it efficiently.
従来の強化学習では離散空間に価値関数を設定してDPで繰り返し計算で解いていた。
しかし、動的な取引の問題を、限られた数の離散空間で価値関数を学習させることで直接解くことは難しい。
よって価値関数ではなく(本来価値関数があたえられた上でそれを最大化するように選ばれることで決まる)政策関数を学習する戦略を採用する。
=== DRL
a nonlinear function is adopted in DRL to approximate the trading action at each time point by $$ \delta_t = tanh[<w, f_t> + b + u\delta_{t-1}] $$ f_t : feature vector of the current market condition at time t
(w, b) are the coefficients for the feature regression
つまり、回帰式にpenaltyを入れてtanhで表現している。bはbias項だがつまり1への回帰である。

In DRL, the recent m return values are directly adopted as the feature vector
$$ f_t = [z_{t−m+1}, . . . , z_t] ∈ R^m. $$ In addition to the features, another term $$ uδ_{t−1} $$ is also added into the regression to take the latest trading decision into consideration.
This term is used to discourage the agent to frequently change the trading positions and, hence, to avoid heavy TCs.
With the linear transformation in the brackets, tanh(·) further maps the function into the range of (−1, 1) to approximate the final trading decision.
The optimization of DRL aims to learn such a family of parameter set $$ \Theta = \{w, u, b\} $$ that can maximize the global reward function in
$$ max_\Theta U_T \{R_1...R_T|\Theta\} $$

B Deep Recurrent Neural Network for DDR

biasはfeatureに1を加えて回帰することで省略。
DRLNNは自分の選択$\delta_t$を再帰的に利用。
RNNを使うことは長期記憶を導入することであり過去の選択を継承できる。
ただし、結局のところfeatureから市況の要約がうまくできない。回帰一般の問題。

To implement feature learning, in this paper, we introduce the prevalent DL into DRL for simultaneously feature learning and dynamic trading.
DL is a very powerful feature learning framework whose potentials have been extensively demonstrated in a number of machine learning problems.
In detail, DL constructs a DNN to hierarchically transform the information from layer to layer.
Such deep representation encourages much informative feature representations for a specific learning task.
階層表現はより豊かに市況を表現できる(つまり非線形かつノンパラな手法)

上の青部分がDLの階層表現。
By extending DL into DRL, the feature learning part(blue panel) is added to the RNN.
DLを非線形な関数による近似と考えると、 $$ F_t = g_d(f_t) $$ であり、   $$ \delta_t = tanh[<w, F_t> + b + u\delta_{t-1}] $$ l+1層のそれぞれのノードはl層のすべてのノードと繋がっている。
$a^l_i$をl層におけるi番目の入力とし、$o^l_i$を対応する出力とする。
$$ a^l_i = <w^l_i, o^{(l-1)}> + b^l_i $$ $$ o^l_i = \frac{1}{1+e^{-a^l_i}} $$ 標準シグモイド関数。
$ o^{l-1}_i $はl-1層からの出力のベクトル。すべてのノードからの出力に重みが乗って入力される。
4層の隠れ層に、それぞれ128ノード。

C Fuzzy Extensions to Reduce Uncertainties

The deep configuration well addresses the feature learning task in the RNN.
↔︎ However, another important issue,

i.e., data uncertainty in financial data, should also be carefully considered.

Financial sequences contain high amount of unpredictable uncertainty due to the random gambling behind trading.
Besides, a number of other factors, e.g., global economic atmosphere and some company rumors, may also affect the direction of the financial signal in real time.
Therefore, reducing the uncertainties in the raw data is an important approach to increase the robustness for financial signal mining.

In the artificial intelligence community, fuzzy learning is an ideal paradigm to reduce the uncertainty in the original data.
Rather than adopting precise descriptions of some phenomena, fuzzy systems prefer to assign fuzzy linguist values to the input data.

Such fuzzified representations can be easily obtained by comparing the real-world data with a number of fuzzy rough sets and then deriving the corresponding fuzzy membership degrees.

Consequently, the learning system only works with these fuzzy representations to make robust control decisions.

ラフ集合(wikipedia)

ラフ集合(ラフしゅうごう、Rough sets)とは上近似集合と下近似集合からなる集合で、非数値の対象を粗く(ラフに)記述することができるものである。これを用いることによって、他のデータマイニング手法からは得られにくい、非数値であったり矛盾のあるようなデータからの知識獲得が可能である。Rough sets theory(ラフ集合理論)の頭文字をとって RST や、Rough sets approach(ラフ集合アプローチ)の頭文字をとって RSA とも呼ばれる。 応用として、対象集合をファジィ集合に拡張したファジィ-ラフ集合理論 (Fuzzy-Rough sets theory) というものがある。

なるほど。ではfuzzy rough setsとは何か。
For the financial problem discussed here, the fuzzy rough sets can be naturally defined according to the basicmovements of the stock price.
In detail, the fuzzy sets are defined on the increasing, decreasing, and the no trend groups.
The parameters in the fuzzy membership function can then be predefined according to the context of the discussed problem.
Alternatively, they could be learned in a fully data-driven manner.
The financial problem is highly complicated and it is hard to manually set up the fuzzy membership functions according to the experiences.

Therefore, we prefer to directly learn the membership functions and this idea will be detailed in Section IV.

直接membership functionsも推定したいよね。
In fuzzy neural networks, the fuzzy representation part is conventionally connected to the input vector ft (green nodes) with different membership functions [35].
To note, in our setting, we follow a pioneering work [35] to assign k different fuzzy degrees to each dimension of the input vector.
特徴ベクトルにk個のfuzzyを接続。
In the cartoon of Fig. 2, only two fuzzy nodes (k = 2) are connected to each input variable due to the space limitation.

In our practical implementation, k is fixed as 3 to describe the increasing, decreasing, and no trend conditions. Mathematically, the i th fuzzy membership function $$ v_i (·) : R → [0, 1] $$ maps the i th input as a fuzzy degree $$ o^{(l)}_i = v_i(a^{(l)}_i) = e^{(−a^{(l)}_i−m_i)^2/σ^2_i} \forall i. (7) $$ The Gaussian membership function with mean m and variance σ2 is utilized in our system following the suggestions of [37] and [38]. After getting the fuzzy representations, they are directly connected to the deep transformation layer to seek for the deep transformations. In conclusion, the fuzzy DRNN (FDRNN) is composed of three major parts as fuzzy representation, deep transformation, and DRT.When viewing the FDRNN as a unified system, these three parts, respectively, play the roles of data preprocessing (reduce uncertainty), feature learning (deep transformation), and trading policy making (RL). The whole optimization framework is given as follows: max {",gd (·),v(·)} UT (R1..RT ) s.t. Rt = δt−1zt − c|δt − δt−1| δt = tanh(⟨w, Ft⟩+b + uδt−1) Ft = gd(v(ft )) (8) where there are three groups of parameters to be learned, i.e., the trading parameters " = (w, b, u), fuzzy representations v(·), and deep transformations gd (·). In the above optimization, UT is the ultimate reward of the RL function, δt is the policy approximated by the FRDNN, and Ft is the high-level feature representat

4 DRNN LEARNING

$$ max_{\{\Theta, g_d(·), v(·)\}} U_T(R_1, R_T) $$$$ s.t. R_t = \delta_{t-1}z_t - c|\delta_t - \delta_{t-1}) $$$$ \delta_t = tanh(<w, F_t> + b + u\delta_{t-1}) $$$$ F_t = g_d(v(f_t)) $$

はコンセプト的にはエレガントである。
しかし、不幸にも相対的に最適化は困難でる。
これは構成されたDNNが幾千もの隠れパラメーターを持ち、それらをinferしなければならないためである。
このセクションでは、私たちは実践的な学習戦略を提示し、Aシステム初期化とBチューニングの2stepsでDNNをtrainする。

A System Initializations

学習の三段階に対してそれぞれパラメーター初期化の戦略を提示する。

The fuzzy representation part [Fig. 2 (purple panel)]

The only parameters to be specified are the fuzzy centers ($m_i$) and widths ($σ^2_i$) of the fuzzy nodes, where i means the i th node of the fuzzy membership layer.
We directly apply k-means to divide the training samples into k classes.
The parameter k is fixed as 3, because each input node is connected with three membership functions.
Then, in each cluster, the mean and variance of each dimension on the input vector ($f_t$) are sequentially calculated to initialize the corresponding $m_i$ and $σ^2_i$ .

The deep transformation part [Fig. 2 (blue panel)]

The AE is adopted to initialize the deep transformation part in Fig. 2 (blue panel). In a nutshell, AE aims at optimally reconstructing the input information on a virtual layer placed after the hidden representations. For ease of explanation, three layers are specified here, i.e., the (l)th input layer, the (l+1)th hidden layer, and the (l+2)th reconstruction layer. These three layers are all well connected. We define hθ (·) [respectively, hγ (·)] as the feedforward transformation from the lth to (l + 1)th layer [respectively, (l + 1)th to (l + 2)th layer] with parameter set θ (respectively, γ ). The AE optimization minimizes the following loss: & t '' x(l) t − hγ $ hθ $ x(l) t %%'' 2 2 + η∥w(l+1)∥22 . (9) To note, x(l) t are the nodes’ statuses of the lth layer with the tth training sample as input. In (9), a quadratic term is added to avoid the overfitting phenomena. After solving the AE optimization, parameter set θ = {w(l+1), b(l+1)} is recorded in the network as the initialized parameter of the (l +1)th layer. The reconstruction layer and its corresponding parameters γ are not used. This is because the reconstruction layer is just a virtual layer, assisting parameter learning of the hidden layer [28], [39]. The AE optimizations are implemented on each hidden layer sequentially until all the parameters in the deep transformation part have been set up.

The DRL part,

the parameters can be initialized using final deep representation Ft as the input to the DRL model. This process is equivalent to solving the shallow RNN in Fig. 1(a), which has been discussed in [17]. It is noted that all the learning strategies presented in this section are all about parameter initializations. In order to make the whole DL system perform robustly in addressing difficult tasks, a fine tuning step is required to precisely adjust the parameters of each layer. This fine tuning step can be considered as task-dependent feature

B Task-Aware BPTT

In the conventional way, the error BP method is applied to the DNN fine tuning step. However, the FRDNN is bit complicated that exhibits both recurrent and deep structures. We denote θ as the general parameter in the FRDNN, and its gradient is easily calculated by the chain rule ∂UT ∂θ = & t dUt dRt ( dRt dδt dδt dθ + dRt dδt−1 dδt−1 dθ ) dδt dθ = ∂δt ∂θ + ∂δt ∂δt−1 dδt−1 dθ . (10) From (10), it is apparent when deriving the gradient dδt/dθ, one should recursively calculate the gradient for dδt−τ /dθ, &τ = 1, . . . , T .1 Such recursive calculation inevitably imposes great difficulties for gradient derivations. To simplify the problem, we introduce the famous BPTT [40] method to cope with the recurrent structure of the NN. By analyzing the FRDNN structure in Fig. 2, the recurrent link comes from the output side to the input side, i.e., δt−1 is used as the input of the neuron to calculate δt. Fig. 3 shows the first two-step unfolding of the FRDNN. We call each block with different values of τ as a time stack, and Fig. 3 shows two time stacks (with τ = 0 and τ = 1). After the BPTT unfolds, the current system does not involve any recurrent structure and the typical BP method is easily applied to it. When getting parameters’ gradients at each separate time stack, they are averaged together forming the final gradient of each parameter. According to Fig. 3, the original DNN becomes even deeper due to the implementations of time-based unfolding. To clarify this point, we remind the readers to notice the time stacks after expansion. It leads to a deep structure along different time delays. Moreover, every time stack (with different values of τ) contains its own deep feature learning part.When directly applying the BPTT, the gradient vanish on deep layers is not avoided in the fine-tuning step [41]. This problem becomes even worse on the high-order time stacks and the front layers. To solve the aforementioned problem, we propose a more practical solution to bring the gradient information directly from the learning task to each time stack and each layer of the DL part. In the time unfolding part, the red dotted lines 1In

are connected from the task UT to the output node of each time stack. With this setting, the back-propagated gradient information of each tim e stack comes from two respective parts: 1) the previous time stack (lower order time delay) and 2) the reward function (learning task). Similarly, the gradient of the output node in each time stack is brought back to the DL layers by the green dotted line. Such a BPTT method with virtual lines connecting with the objective function is termed task-aware BPTT. The detailed process to train the FRDNN has been summarized in Algorithm 1. In the algorithm, we denote " as the general symbol to represent parameters. It represents the whole latent parameters’ family involved in the FRDNN. Before the gradient decreasing implementation in line 10, the calculated gradient vector is further normalized to avoid extremely large

V. EXPERIMENTAL VERIFICATIONS A. Experimental Setup We test the DDR trading model on the real-world financial data. Both the stock index and commodity future contracts are tested in this section. For the stock-index data, we select the stock-IF contract, which is the first index-based future contract traded in China. The IF data is calculated based on the prices of the top 300 stocks from both Shanghai and Shenzhen exchange centers. The IF future is the most liquid one and occupies the heaviest trading volumes among all the future contracts in China. On the commodity market, the silver (AG) and sugar (SU) contracts are used, because both of them exhibit very high liquidity, allowing trading actions to be executed in almost real time. All these contracts allow Fig. 4. Prices (based on minute resolutions) of the three tested future contracts. Red parts: RDNN initializations. Blue parts: out-of-sample tests. TABLE I SUMMARY OF SOME PRACTICAL PROPERTIES OF THE TRADED CONTRACTS both short and long operations. The long (respectively, short) position makes profits when the subsequent market price goes higher (respectively, lower). The financial data are captured by our own trading system in each trading day and the historic data is maintained in a database. In our experiment, the minute-level close prices are used, implying that there is a 1-min interval between the price pt and pt+1. The historic data of the three contracts in the minute resolutions are shown in Fig. 4. In this one-year period, the IF contracts accumulate more ticks than commodity data because the daily trading period of IF is much longer than commodity contracts. From Fig. 4, it is also interesting to note that these three contracts exhibit quite different market patterns. IF data get very large upward and downward movements in the tested period. The AG contract, generally, shows a downward trend and the SU has no obvious direction in the testing period. For practical usage, some other issues related to trading should also be considered. We have summarized some detailed information about these contracts in Table I. The inherent values of these three contracts are evaluated by China Yuan (CNY) per point (CNY/pnt). For instance, in the IF data, the increase (decrease) in one point may lead to a reward of 300 CNY for a long (respectively, short) position and vice versa. The TCs charged by the brokerage company is also provided. By considering other risky factors, a much higher c is set in (1). It is five times higher than the real TCs. The raw price changes of the last 45 min and the momentum change to the previous 3 h, 5 h, 1 day, 3 days, and 10 days is directly used as the input of the trading system (ft ∈ R50). In the fuzzy learning part, each of the 50 input nodes are connected with three fuzzy membership functions to seek for the first-level fuzzy representation in R150. Then, the fuzzy layer is sequentially passed through four deep transformation layers with 128, 128, 128, and 20 hidden nodes per layer. The feature representation (Ft ∈ R20) of the final deep layer is connected with the DRL part for trading policy making. B. Details on Deep Training In this section, we discuss some details related to deep training. In practice, the system is trained by two sequential This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. DENG et al.: DDR LEARNING FOR FINANCIAL SIGNAL REPRESENTATION AND TRADING 7 steps of initialization and online updating. In the initialization step, the first 15 000 time points of each time series in Fig. 4 (red parts) are employed for system warming up. It is noted that these initialization data will not be used for out-of-sample tests. After initialization, the parameters in the RDNN are iteratively updated in an online manner with the recently released data. The online updating strategy allows the model to get aware of the latest market condition and revise its parameters accordingly. In practice, the first 15 000 time points are used to set up the RDNN and the well-trained system is exploited to trade the time points from 15 001 to 20 000. Then, the sliding window of the training data is moved 5000 ticks forward covering a new training set from 5000 to 20 000. As indicated in Section IV, the training phase of the RDNN is composed of two main steps of layerwise parameter initialization and fine tuning. It is clarified here that the parameter initialization implementations are only performed in the first round of training, i.e., on the first 15 000 ticks. With the sliding window of the training set moving ahead, the optimal parameters obtained from the last training round are directly used as the initialized values. FDDR is a highly nonconvex system and only a local minimum is expected after convergence. Besides, the overfitting phenomenon is a known drawback faced by most DNNs. To mitigate the disturbances of overfitting, we adopt two convenient strategies that have been proved to be powerful in practice. The first strategy is the widely used early stopping method for DNN training. The system is only trained for 100 epochs with a gradient decreasing parameter be ηc = 0.97 in Algorithm 1. Second, we conduct model selection to select a good model for the out-of-sample data. To achieve this goal, 15 000 training points are divided into two sets as RDNN training set (first 12 000) and validation set (last 3000). On the first 12 000 time points, FDDR is trained for 5 times and the best one is selected on the next 3000 blind points. Such validation helps to exclude some highly overfitted NNs from the training set. Another challenge in training RDNN comes from the gradient vanishing issue, and we have introduced the task-aware BPTT method to cope with this problem. To prove its effectiveness, we show the training performance with the comparison with the typical BPTT method. The trading model is trained on the first 12 000 points of the IF data in Fig. 4(a). The objective function values (accumulated rewards) of the two methods along with the training epochs are shown in Fig. 5. From the comparisons, it is apparent that the task-aware BPTT outperforms the BPTT method in making more rewards (accumulated trading profits). Moreover, the task-aware BPTT requires less iterative steps for convergence. C. General Evaluations In this section, we evaluate the DDR trading system on practical data. The system is compared with other RL systems for online trading. The first competitor is the DRL system [17]. Besides, the sparse coding-inspired optimal training (SCOT) Fig. 5. Training epochs and the corresponding rewards by training the DRNN by (a) task-aware BPTT and (b) normal BPTT. The training data are the first 12 000 data points in Fig. 4(a). system [19] is also evaluated, which uses shallow feature learning part of sparse coding.2 Finally, the results of DDR and its FDDR are also reported. In the previous discussions, the reward function defined for RL is regarded as the TPs gained in the training period. Compared with total TP, in modern portfolio theory, the riskadjusted profits are more widely used to evaluate a trading system’s performance. In this paper, we will also consider an alternative reward function into the RL part, i.e., SR, which has been widely used in many trading related works [42], [43]. The SR is defined as the ratio of average return to standard deviation of the returns calculated in period 1, . . . , T , i.e., USR T = (mean(Rt )/std(Rt )). To simplify the expression, we follow the same idea in DRL [17] to use the moving SR instead. In general, moving SR gets the first-order Taylor expansion of typical SR and then updates the value in an incremental manner. Please refer to [17, Sec. 2.4] for detailed derivations. Different trading systems are trained with both TP and SR as the RL objectives. The details of the profit and loss (P&L) curves are shown in Fig. 6. The quantitative evaluations are summarized in Table II, where the testing performances are also reported in the forms of TP and SR. The performance of buying and holding (B&H) is also reported in Table II as the comparison baseline. From the experimental results, three observations can be found as follows. The first observation is that all the methods achieved much more profits in the trending market. Since we have allowed short operation in trading, the trader can also make money in the downward market. This can be denmonstrated from the IF data that Chinese market exhibits an increasing trend in the first, followed by a sudden drop. FDDR makes profits in either case. It is also observed that the P&L curve suffers a drawback during the transition period from the increasing trend to the decreasing trend. This is possible due to the significant differences in the training and testing data. In general, the RL model is particularly suitable to be applied to the trending market condition. Second, FDDR and DDR generally outperform the other two competitors on all the markets. SCOT also gains better performance than DRL in most conditions. This observation verifies that feature learning indeed contributes in improving the trading performance. Besides, the DL methods

(FDDR and DDR) make more profits with higher SR on all the tests than the shallow learning approach (SCOT). Among the two DL methods, adding an extra layer for fuzzy representation seems to be a good way to further improve the results. This claim can be easily verified from Table II and Fig. 6 in which FDDR wins the DDR on all the tests except the one in Fig. 6(g). As claimed in the last paragraph, the DRL is a trend following system that may suffer losses on the market with small volatility. However, it is observed from the results that even on the nontrending period, e.g., on the SU data or in the early period of the IF data, FDDR is also effective to make the positive accumulation from the swing market patterns. Such finding successfully verifies another important property of fuzzy learning in reducing market uncertainties. Third, exploiting SR as the RL objective always leads to more reliable performances. Such reliability can be observed from both the SR quantity in Table II and the shapes of the P&L curves in Fig. 6. It is observed from Table II that the highest profits on the IF data were made by optimizing the TP as the objective in DDR. However, the SR on that testing condition is worse than the other. In portfolio management, rather than struggling for the highest profits with high risk, it is more intellectual to make good profits within acceptable risk level. Therefore, in practical usage, it is still recommended to use the SR as the reward function for RL. In conclusion, the RL framework is perhaps a trend-based trading strategy and could make reliable profits on the markets with large price movement (no matter in which direction). The DL-based trading systems generally outperform other DRL models either with or without shallow feature learning. By incorporating the fuzzy learning concept into the system, the FDDR can even generate good results in the nontrending period. When training a deep trading model, it is suggested to use the SR as the RL objective which balances the profit and the risk well.


In [ ]:
VI. CONCLUSION
This paper introduces the contemporary DL into a typical
DRL framework for financial signal processing and online
trading. The contributions of the system are twofold. First, it is
a technical-indicator-free trading system that greatly releases
humans to select the features from a large amount of candidates.
This advantage is due to the automatic feature learning
mechanism of DL. In addition, by considering the nature of
the financial signal, we have extended the fuzzy learning into
the DL model to reduce the uncertainty in the original time
series. The results on both the stock-index and commodity
future contracts demonstrate the effectiveness of the learning
system in simultaneous market condition summarization and
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
DENG et al.: DDR LEARNING FOR FINANCIAL SIGNAL REPRESENTATION AND TRADING 11
optimal action learning. To the best of our knowledge, this
is the first attempt to use the DL with the real-time financial
trading.
While the power of the DDR system has been verified in
this paper, there are some promising future directions. First,
all the methods proposed in this paper only handle one share
of the asset. In some large hedge funds, the trading systems
are always required to be capable in managing a number
of assets simultaneously. In the future, the DL framework
will be extended to extract features from multiple asserts
and to learn the portfolio management strategies. Second, the
financial market is not stationary that may change in real time.
The knowledge learned from the past training data may not
sufficiently reflect the information of the subsequent testing
period. The method to intelligently select the right training
period is still an open problem in the field.
REFERENCES
[1] E. W. Saad, D. V. Prokhorov, and D. C. Wunsch, II, Comparative

In [19]:
%matplotlib inline

In [87]:
import numpy as np
import pandas as pd

# 統計用ツール
import statsmodels.api as sm
import statsmodels.tsa.api as tsa
from patsy import dmatrices

#描画
import matplotlib.pyplot as plt
from pandas.tools.plotting import autocorrelation_plot

#株価
import pandas as pd
import pandas.io.data as web

#深層学習
import chainer
from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L


/Users/NIGG/anaconda/lib/python3.5/site-packages/pandas/io/data.py:33: FutureWarning: 
The pandas.io.data module is moved to a separate package (pandas-datareader) and will be removed from pandas in a future version.
After installing the pandas-datareader package (https://github.com/pydata/pandas-datareader), you can change the import ``from pandas.io import data, wb`` to ``from pandas_datareader import data, wb``.
  FutureWarning)

In [98]:
start = '2000-10-01'
end = '2016-10-01'

p = web.DataReader('^N225', 'yahoo', start, end)
p.head()


Out[98]:
Open High Low Close Volume Adj Close
Date
2000-10-02 15735.709961 15902.509766 15514.040039 15902.509766 0 15902.509766
2000-10-03 15905.250000 15956.450195 15779.540039 15912.089844 0 15912.089844
2000-10-04 15906.179688 16153.669922 15808.660156 16149.080078 0 16149.080078
2000-10-05 16157.200195 16192.780273 16052.120117 16099.259766 0 16099.259766
2000-10-06 16084.049805 16084.049805 15884.650391 15994.240234 0 15994.240234

In [107]:
price = p["Adj Close"]

In [112]:
price = price.pct_change()[1:] + 1

In [114]:
price = list(price)

In [129]:
f = np.matrix([price[i:i+50] for i in range(len(price) - 50)])

In [ ]:


In [38]:
rho = 1
c_0 = 1

In [44]:
np.tanh(100)


Out[44]:
1.0

In [ ]:
R = delta*z - c*np.abs(delta - delta)

In [45]:
np.abs(1-9)


Out[45]:
8

In [51]:
from sklearn.cluster import KMeans

In [135]:
kms = KMeans(n_clusters=3).fit_predict(f[0].T)

In [144]:
np.array(f[0])[0]


Out[144]:
array([ 1.01428269,  0.98228509,  0.99655113,  0.99608644,  0.99046394,
        1.02268784,  0.98348144,  1.02641543,  0.97730243,  0.9803858 ,
        1.02719169,  1.03042707,  0.96803347,  1.01002362,  0.9764374 ,
        1.02197054,  0.98022297,  1.01072269,  1.01336307,  1.01760932,
        0.97534862,  1.03838272,  0.96332872,  1.0059021 ,  0.97418176,
        1.01769367,  0.98305799,  1.02176659,  1.00980508,  0.976403  ,
        1.01156901,  1.00206568,  0.99238571,  1.00104966,  1.00848143,
        1.02728559,  0.96842021,  0.9938368 ,  1.02023548,  1.0030142 ,
        0.9953541 ,  0.97479007,  1.0311285 ,  0.97574616,  1.00984251,
        1.02337691,  0.98519189,  0.997006  ,  0.98057386,  0.99065623])

In [149]:
np.array(f[0])[0][kms == 0].mean()


Out[149]:
1.0013786862889182

In [121]:
kmeans = KMeans(n_clusters=3, random_state=0).fit(f[0].T)


/Users/NIGG/anaconda/lib/python3.5/site-packages/sklearn/utils/validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-121-83cf78db1e1e> in <module>()
----> 1 kmeans = KMeans(n_clusters=3, random_state=0).fit(f[0].T)

/Users/NIGG/anaconda/lib/python3.5/site-packages/sklearn/cluster/k_means_.py in fit(self, X, y)
    879         """
    880         random_state = check_random_state(self.random_state)
--> 881         X = self._check_fit_data(X)
    882 
    883         self.cluster_centers_, self.labels_, self.inertia_, self.n_iter_ = \

/Users/NIGG/anaconda/lib/python3.5/site-packages/sklearn/cluster/k_means_.py in _check_fit_data(self, X)
    857         if X.shape[0] < self.n_clusters:
    858             raise ValueError("n_samples=%d should be >= n_clusters=%d" % (
--> 859                 X.shape[0], self.n_clusters))
    860         return X
    861 

ValueError: n_samples=1 should be >= n_clusters=3

In [80]:
kmeans.labels_


Out[80]:
array([2, 2, 2, 0, 0, 0, 2, 2, 2, 2, 0, 0, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2, 2,
       2, 2, 2, 0, 0, 2, 2, 2, 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 0, 2, 0, 2, 2,
       2, 2, 2, 2, 2, 2, 0, 2, 0, 2, 0, 2, 2, 0, 2, 2, 2, 2, 2, 2, 0, 0, 2,
       2, 0, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2, 2, 0, 2, 0, 2, 2, 0, 2, 2, 2, 2,
       2, 0, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 2, 0,
       2, 0, 0, 0, 0, 0, 2, 2, 0, 2, 2, 0, 0, 0, 2, 0, 0, 2, 2, 2, 2, 2, 0,
       2, 0, 2, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 0, 2, 2, 2,
       2, 0, 2, 2, 0, 0, 2, 2, 0, 2, 0, 2, 0, 0, 0, 2, 2, 0, 0, 2, 0, 0, 0,
       0, 2, 0, 2, 0, 0, 2, 0, 2, 0, 2, 0, 0, 0, 0, 2, 2, 0, 0, 0, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 2, 2, 0, 0, 2, 0, 0,
       1, 2, 0, 0, 2, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 0, 2,
       0, 2, 0, 2, 0, 0, 2, 0, 0, 0, 0, 2, 1, 2, 0, 2, 0, 2, 0, 2, 2, 2, 2,
       0, 0, 0, 0, 0, 2, 2, 0, 2, 0, 2, 0, 0, 2, 2, 0, 2, 0, 0, 2, 2, 0, 0,
       2, 0, 0, 2, 0, 2, 2, 0, 0, 0, 0, 2, 1, 2, 2, 2, 2, 2, 0, 0, 2, 0, 0,
       0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 2, 2, 1, 2, 2, 0, 0, 0, 2, 0, 2, 2,
       0, 0, 0, 0, 0, 1, 2, 2, 1, 0, 2, 1, 0, 2, 0, 0, 2, 0, 1, 0, 0, 1, 0,
       2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 1, 0,
       2, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 1, 1, 2, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 2, 1, 1, 0, 0, 2, 1, 1, 0, 2,
       0, 0, 2, 2, 1, 0, 1, 0, 2, 0, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1,
       1, 0, 1, 0, 2, 1, 2, 0, 0, 1, 2, 2, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0,
       1, 1, 1, 0, 1, 0, 2, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 1,
       1, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 2, 1, 0, 0, 1, 0, 2, 0, 0, 0, 1, 0, 0, 2, 1, 1, 0, 1, 0, 1, 2,
       0, 0, 1, 1, 1, 0, 1, 2, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 2, 1, 1,
       0, 0, 1, 0, 2, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 2, 1, 1,
       1, 2, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 2, 2, 1, 1,
       1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 2,
       0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 2, 1, 0, 1, 0, 1, 1, 0, 0,
       0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1,
       0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0,
       0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1,
       1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       1, 1, 0, 1, 0, 2, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 2, 1, 2,
       0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1,
       1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1,
       0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,
       1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1,
       0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0,
       1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,
       0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1], dtype=int32)

In [ ]:
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([0, 0, 0, 1, 1, 1], dtype=int32)
>>> kmeans.predict([[0, 0], [4, 4]])
array([0, 1], dtype=int32)
>>> kmeans.cluster_centers_
array([[ 1.,  2.],
       [ 4.,  2.]])

In [ ]:
def o():
    return 1/(1+np.exp(-a))

In [ ]:
def fuzzy():
    return np.exp(-1*(a - m)**2/sigma**2)

In [150]:
class FRDNN(object):
    def __init__(self, data, rho, c_0):
        self.data = data
        self.rho = rho
        self.c_0 = c_0
    
    def _fuzzy_(self):
        self.vf = ji

In [84]:
np.exp(-1*(100 - 99)**2/0.1**2)


Out[84]:
3.7200759760208889e-44

In [49]:
class CAR(Chain):
    def __init__(self, unit1, unit2, unit3, col_num):
        self.unit1 = unit1
        self.unit2 = unit2
        self.unit3 = unit3
        super(CAR, self).__init__(
            l1 = L.Linear(col_num, unit1),
            l2 = L.Linear(self.unit1, self.unit1),
            l3 = L.Linear(self.unit1, self.unit2),
            l4 = L.Linear(self.unit2, self.unit3),
            l5 = L.Linear(self.unit3, self.unit3),
            l6 = L.Linear(self.unit3, 1),
        )
    
    def __call__(self, x, y):
        fv = self.fwd(x, y)
        loss = F.mean_squared_error(fv, y)
        return loss
    
    def fwd(self, x, y):
        h1 = F.sigmoid(self.l1(x))
        h2 = F.sigmoid(self.l2(h1))
        h3 = F.sigmoid(self.l3(h2))
        h4 = F.sigmoid(self.l4(h3))
        h5 = F.sigmoid(self.l5(h4))
        h6 = self.l6(h5)
        return h6

In [50]:
def DL(df, n, bs = 200):
    dum1 = pd.DataFrame((df['pay'] < 100000)*1)
    dum1.columns = ['low']
    dum2 = pd.DataFrame((df['pay'] > 150000)*1)
    dum2.columns = ['high']
    dum = pd.concat((dum1, dum2), axis=1)
    
    df_with_dummy = pd.concat((df, dum), axis=1)
    
    cluster_array = np.array([df['square'], df['fX']*1000, df['fY']*1000])
    gmm = mixture.GaussianMixture(n_components=n, covariance_type='full').fit(cluster_array.T)
    dum = pd.get_dummies(gmm.predict(cluster_array.T))
    dum_nam = ['d%s'%i for i in range(n)] 
    dum.columns = dum_nam
    
    df_with_dummy = pd.concat((df_with_dummy, dum), axis=1)

    vars = ['pay', 'square', 'k', 'lk', 'dk', 'sdk', 'sldk', 'south_direction_dummy', 'building_year', 
            'new_dummy', 'mansyon_dumy', 'teiki_syakuya_dummy', 'walk_minute_dummy', 'r', 'rc_dummy', 
            'room_nums', 'low', 'high']
    vars = vars + dum_nam[:-1]

    eq = fml_build(vars)

    y, X = dmatrices(eq, data=df_with_dummy, return_type='dataframe')
    
    y_in = y[1:1000]
    X_in = X[1:1000]
    
    y_ex = y[1000:]
    X_ex = X[1000:]

    logy_in = np.log(y_in)
    
    logy_in = np.array(logy_in, dtype='float32')
    X_in = np.array(X_in, dtype='float32')
    
    y = Variable(logy_in)
    x = Variable(X_in)
    
    num, col_num = X_in.shape

    model1 = CAR(10, 10, 3, col_num)
    optimizer = optimizers.SGD()
    optimizer.setup(model1)
    
    for j in range(1000):
        sffindx = np.random.permutation(num)
        for i in range(0, num, bs):
            x = Variable(X_in[sffindx[i:(i+bs) if (i+bs) < num else num]])
            y = Variable(logy_in[sffindx[i:(i+bs) if (i+bs) < num else num]])
            model1.zerograds()
            loss = model1(x, y)
            loss.backward()
            optimizer.update()
        if j % 1000 == 0:
            loss_val = loss.data
            print('epoch:', j)
            print('train mean loss={}'.format(loss_val))
            print(' - - - - - - - - - ')
    
    return model1, np.array(y_ex, dtype='float32').reshape(len(y_ex)), np.array(X_ex, dtype='float32')

In [ ]:
results, y_ex, X_ex = DL(df, 20)