In [1]:
""" From:  http://danielhnyk.cz/predicting-sequences-vectors-keras-using-rnn-lstm/ """
"""
time series for 20 stocks (e.g., IBM, AT&T, APPL). Each stock has 10,000 data points (length of time series).
In fact this code is written for multivariate time series setting.
It should be simple to turn it into the univariate case, if you like that case.

In order to run the code, make sure that the data matrix is a numpy array of size
T x N where T is the length of time series and N is the number of them.

Technically, with RNNs you should pass the entire history of time series and RNNs should be able to capture
the patterns over long period of time. But in practice most of the time series do not have the same length.
The solution to this is to pad them with zeros and use masking layer (supported in Keras).
However this process can be inefficient, when you have lots of variations on the length of time series.
One solution for this is to break long sequences into smaller pieces. The size of window determines the longest
patterns in time that can possibly be captured by RNN. You need to select the window size according to the
expected length of temporal patterns.
"""


Out[1]:
' From:  http://danielhnyk.cz/predicting-sequences-vectors-keras-using-rnn-lstm/ '
Out[1]:
'\ntime series for 20 stocks (e.g., IBM, AT&T, APPL). Each stock has 10,000 data points (length of time series).\nIn fact this code is written for multivariate time series setting.\nIt should be simple to turn it into the univariate case, if you like that case.\n\nIn order to run the code, make sure that the data matrix is a numpy array of size\nT x N where T is the length of time series and N is the number of them.\n\nTechnically, with RNNs you should pass the entire history of time series and RNNs should be able to capture\nthe patterns over long period of time. But in practice most of the time series do not have the same length.\nThe solution to this is to pad them with zeros and use masking layer (supported in Keras).\nHowever this process can be inefficient, when you have lots of variations on the length of time series.\nOne solution for this is to break long sequences into smaller pieces. The size of window determines the longest\npatterns in time that can possibly be captured by RNN. You need to select the window size according to the\nexpected length of temporal patterns.\n'

In [2]:
import numpy as np

In [3]:
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import glob

dpath = '/media/db/energy'
fs = glob.glob(dpath + "/CON*")

In [4]:
dpath = '/media/db/energy/CON_consolidated.csv'
df = pd.read_csv(dpath, delimiter=';', decimal=',')
df['Date'] = pd.to_datetime(df['Date'])
df.head()


Out[4]:
Date CON BLT Normal MK01 MWh/h H N CON EE Normal MK01 MWh/h H N CON LV Normal MK01 MWh/h H N CON LT Normal MK01 MWh/h H N CON CEE Normal MK01 MWh/h H N CON PL Normal MK01 MWh/h H N CON CZ Normal MK01 MWh/h H N CON SK Normal MK01 MWh/h H N CON HU Normal MK01 MWh/h H N ... CON SEE Normal MK01 MWh/h H N CON RO Normal MK01 MWh/h H N CON BA Normal MK01 MWh/h H N CON HR Normal MK01 MWh/h H N CON RS Normal MK01 MWh/h H N CON SI Normal MK01 MWh/h H N CON BG Normal MK01 MWh/h H N CON GR Normal MK01 MWh/h H N CON MK Normal MK01 MWh/h H N CON UK Normal MK01 MWh/h H N
0 2014-01-01 00:00:00 2558.1 887.3 664.6 1006.2 27035.5 14311.7 6102.9 2863.3 3757.6 ... 25727.4 5305.6 1302.0 1855.3 5763.3 1201.0 3665.2 5421.2 1213.7 33299.5
1 2014-01-01 01:00:00 2463.1 858.6 636.3 968.2 26205.2 13801.4 6153.2 2749.4 3501.2 ... 24449.4 5174.5 1194.0 1704.3 5288.0 1115.7 3510.0 5368.3 1094.5 31570.5
2 2014-01-01 02:00:00 2409.5 834.6 624.9 950.0 25623.1 13449.6 6152.5 2667.3 3353.6 ... 23183.1 5074.7 1136.8 1621.8 4913.0 966.4 3391.3 5061.7 1017.4 31150.0
3 2014-01-01 03:00:00 2403.7 820.7 632.5 950.5 25268.6 13341.2 5946.3 2638.5 3342.6 ... 22480.7 5118.7 1107.7 1574.4 4635.4 781.4 3453.8 4854.1 955.1 29841.4
4 2014-01-01 04:00:00 2427.6 813.2 656.0 958.5 24778.8 13188.3 5561.0 2641.2 3388.2 ... 22277.5 5211.0 1107.4 1543.1 4421.6 782.0 3552.2 4734.8 925.5 28280.9

5 rows × 53 columns


In [5]:
# Xi = timeseries of energy consumption
#X = df.ix[:, 1:].as_matrix().transpose()
X = df.ix[:, 1:].as_matrix()
X[0, :5]
X.shape


Out[5]:
array([  2558.1,    887.3,    664.6,   1006.2,  27035.5])
Out[5]:
(43800, 52)

In [6]:
scaler = preprocessing.StandardScaler()
X_std = scaler.fit_transform(X)
X_std[0, :5]
#X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.4, random_state=0)


Out[6]:
array([-0.83404449, -0.34068709, -0.97042644, -1.11341722, -1.11325434])

In [7]:
def _load_data(data, steps = 40):  
    docX, docY = [], []
    for i in range(0, data.shape[0]//steps-1):
        docX.append(data[i*steps:(i+1)*steps,:])
        docY.append(data[(i*steps+1):((i+1)*steps+1),:])
    alsX = np.array(docX)
    alsY = np.array(docY)
    return alsX, alsY

In [8]:
def train_test_split(data, test_size=0.15, steps=40):  
    #    This just splits data to training and testing parts
    X,Y = _load_data(data, steps=steps)
    ntrn = round(X.shape[0] * (1 - test_size))
    perms = np.random.permutation(X.shape[0])
    X_train, Y_train = X.take(perms[0:ntrn],axis=0), Y.take(perms[0:ntrn],axis=0)
    X_test, Y_test = X.take(perms[ntrn:],axis=0),Y.take(perms[ntrn:],axis=0)
    return (X_train, Y_train), (X_test, Y_test)

In [9]:
steps = 80
np.random.seed(0)  # For reproducability
#data = np.genfromtxt('closingAdjLog.csv', delimiter=',')
(X_train, y_train), (X_test, y_test) = train_test_split(np.flipud(X_std), steps=steps)  # retrieve data
print("Data loaded.")


Data loaded.

In [10]:
X_train.shape, y_train.shape
X_train[0,:5,0]
y_train[0,:5,0]


Out[10]:
((464, 80, 52), (464, 80, 52))
Out[10]:
array([ 1.04307295,  1.17331704,  1.24764482,  1.29759445,  1.30424304])
Out[10]:
array([ 1.17331704,  1.24764482,  1.29759445,  1.30424304,  1.43414618])

In [11]:
import keras
from keras.models import Sequential  
from keras.layers.core import Dense, Activation, Dropout  
from keras.layers.wrappers import TimeDistributed
from keras.layers.recurrent import GRU
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Convolution1D
from keras.layers.convolutional import MaxPooling1D

in_out_neurons = X.shape[1]
hidden_neurons = 300
epochs = 100
batch = 64


Using TensorFlow backend.

In [12]:
def get_model1():
    model = Sequential()  
    model.add(GRU(hidden_neurons, input_dim=in_out_neurons, input_length=steps, return_sequences=True))
    #model.add(BatchNormalization())
    #model.add(Dropout(0.2))
    model.add(TimeDistributed(Dense(in_out_neurons)))
    #model.add(Dense(in_out_neurons))
    model.add(Activation("linear"))  
    model.compile(loss="mean_squared_error", optimizer="rmsprop") 
    print("Model compiled.")
    return model

h5Name = 'data/tw_stock_TF_m1'
model = get_model1()
model.summary()


Model compiled.
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
gru_1 (GRU)                      (None, 80, 300)       317700      gru_input_1[0][0]                
____________________________________________________________________________________________________
timedistributed_1 (TimeDistribute(None, 80, 52)        15652       gru_1[0][0]                      
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 80, 52)        0           timedistributed_1[0][0]          
====================================================================================================
Total params: 333352
____________________________________________________________________________________________________

In [13]:
def get_model2():
    model = Sequential()  
    model.add(GRU(hidden_neurons, input_dim=in_out_neurons, input_length=steps, return_sequences=True))
    model.add(GRU(hidden_neurons, input_dim=in_out_neurons, input_length=steps, return_sequences=True))
    #model.add(Dropout(0.2))
    model.add(TimeDistributed(Dense(hidden_neurons)))
    model.add(TimeDistributed(Dense(in_out_neurons)))
    model.add(Activation("linear"))  
    model.compile(loss="mean_squared_error", optimizer="rmsprop") 
    print("Model compiled.")
    return model

h5Name = 'data/tw_stock_TF_m2'
model = get_model2()
model.summary()


Model compiled.
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
gru_2 (GRU)                      (None, 80, 300)       317700      gru_input_2[0][0]                
____________________________________________________________________________________________________
gru_3 (GRU)                      (None, 80, 300)       540900      gru_2[0][0]                      
____________________________________________________________________________________________________
timedistributed_2 (TimeDistribute(None, 80, 300)       90300       gru_3[0][0]                      
____________________________________________________________________________________________________
timedistributed_3 (TimeDistribute(None, 80, 52)        15652       timedistributed_2[0][0]          
____________________________________________________________________________________________________
activation_2 (Activation)        (None, 80, 52)        0           timedistributed_3[0][0]          
====================================================================================================
Total params: 964552
____________________________________________________________________________________________________

In [14]:
def get_model3():
    model = Sequential()  
    model.add(Convolution1D(input_shape=(steps, in_out_neurons) ,nb_filter=256, filter_length=3, border_mode='same', activation='relu'))
    #model.add(MaxPooling1D(pool_length=2))
    model.add(Dropout(0.2))
    model.add(GRU(hidden_neurons, return_sequences=True))
    #model.add(BatchNormalization())
    #model.add(Dropout(0.2))
    model.add(TimeDistributed(Dense(in_out_neurons)))
    #model.add(Dense(in_out_neurons))
    model.add(Activation("linear"))  
    model.compile(loss="mean_squared_error", optimizer="rmsprop") 
    print("Model compiled.")
    return model

h5Name = 'data/tw_stock_TF_m3'
model = get_model3()
model.summary()


Model compiled.
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
convolution1d_1 (Convolution1D)  (None, 80, 256)       40192       convolution1d_input_1[0][0]      
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 80, 256)       0           convolution1d_1[0][0]            
____________________________________________________________________________________________________
gru_4 (GRU)                      (None, 80, 300)       501300      dropout_1[0][0]                  
____________________________________________________________________________________________________
timedistributed_4 (TimeDistribute(None, 80, 52)        15652       gru_4[0][0]                      
____________________________________________________________________________________________________
activation_3 (Activation)        (None, 80, 52)        0           timedistributed_4[0][0]          
====================================================================================================
Total params: 557144
____________________________________________________________________________________________________

In [15]:
# and now train the model. 
import tensorflow as tf
keras.backend.get_session().run(tf.global_variables_initializer())
model.fit(X_train, y_train, batch_size=batch, nb_epoch=epochs, validation_split=0.2)


Train on 371 samples, validate on 93 samples
Epoch 1/100
371/371 [==============================] - 2s - loss: 0.4832 - val_loss: 0.2895
Epoch 2/100
371/371 [==============================] - 0s - loss: 0.1980 - val_loss: 0.1117
Epoch 3/100
371/371 [==============================] - 0s - loss: 0.1154 - val_loss: 0.1151
Epoch 4/100
371/371 [==============================] - 0s - loss: 0.1351 - val_loss: 0.1452
Epoch 5/100
371/371 [==============================] - 0s - loss: 0.1078 - val_loss: 0.0913
Epoch 6/100
371/371 [==============================] - 1s - loss: 0.0938 - val_loss: 0.1322
Epoch 7/100
371/371 [==============================] - 0s - loss: 0.1160 - val_loss: 0.0933
Epoch 8/100
371/371 [==============================] - 1s - loss: 0.0925 - val_loss: 0.0840
Epoch 9/100
371/371 [==============================] - 0s - loss: 0.0875 - val_loss: 0.1116
Epoch 10/100
371/371 [==============================] - 1s - loss: 0.0946 - val_loss: 0.0660
Epoch 11/100
371/371 [==============================] - 0s - loss: 0.0720 - val_loss: 0.0967
Epoch 12/100
371/371 [==============================] - 0s - loss: 0.0907 - val_loss: 0.0528
Epoch 13/100
371/371 [==============================] - 0s - loss: 0.0675 - val_loss: 0.0676
Epoch 14/100
371/371 [==============================] - 0s - loss: 0.0801 - val_loss: 0.0626
Epoch 15/100
371/371 [==============================] - 0s - loss: 0.0718 - val_loss: 0.0675
Epoch 16/100
371/371 [==============================] - 0s - loss: 0.0826 - val_loss: 0.0979
Epoch 17/100
371/371 [==============================] - 0s - loss: 0.0626 - val_loss: 0.0440
Epoch 18/100
371/371 [==============================] - 0s - loss: 0.0559 - val_loss: 0.0654
Epoch 19/100
371/371 [==============================] - 0s - loss: 0.0806 - val_loss: 0.0656
Epoch 20/100
371/371 [==============================] - 0s - loss: 0.0583 - val_loss: 0.0462
Epoch 21/100
371/371 [==============================] - 0s - loss: 0.0549 - val_loss: 0.0524
Epoch 22/100
371/371 [==============================] - 0s - loss: 0.0704 - val_loss: 0.0569
Epoch 23/100
371/371 [==============================] - 0s - loss: 0.0576 - val_loss: 0.0482
Epoch 24/100
371/371 [==============================] - 0s - loss: 0.0517 - val_loss: 0.0511
Epoch 25/100
371/371 [==============================] - 0s - loss: 0.0627 - val_loss: 0.0611
Epoch 26/100
371/371 [==============================] - 0s - loss: 0.0514 - val_loss: 0.0425
Epoch 27/100
371/371 [==============================] - 0s - loss: 0.0570 - val_loss: 0.0602
Epoch 28/100
371/371 [==============================] - 0s - loss: 0.0557 - val_loss: 0.0456
Epoch 29/100
371/371 [==============================] - 0s - loss: 0.0586 - val_loss: 0.0511
Epoch 30/100
371/371 [==============================] - 0s - loss: 0.0433 - val_loss: 0.0384
Epoch 31/100
371/371 [==============================] - 0s - loss: 0.0541 - val_loss: 0.0505
Epoch 32/100
371/371 [==============================] - 0s - loss: 0.0486 - val_loss: 0.0428
Epoch 33/100
371/371 [==============================] - 0s - loss: 0.0493 - val_loss: 0.0601
Epoch 34/100
371/371 [==============================] - 0s - loss: 0.0519 - val_loss: 0.0357
Epoch 35/100
371/371 [==============================] - 0s - loss: 0.0401 - val_loss: 0.0381
Epoch 36/100
371/371 [==============================] - 0s - loss: 0.0596 - val_loss: 0.0779
Epoch 37/100
371/371 [==============================] - 0s - loss: 0.0450 - val_loss: 0.0339
Epoch 38/100
371/371 [==============================] - 0s - loss: 0.0402 - val_loss: 0.0321
Epoch 39/100
371/371 [==============================] - 0s - loss: 0.0387 - val_loss: 0.0471
Epoch 40/100
371/371 [==============================] - 0s - loss: 0.0600 - val_loss: 0.0657
Epoch 41/100
371/371 [==============================] - 0s - loss: 0.0421 - val_loss: 0.0305
Epoch 42/100
371/371 [==============================] - 0s - loss: 0.0363 - val_loss: 0.0320
Epoch 43/100
371/371 [==============================] - 0s - loss: 0.0502 - val_loss: 0.0399
Epoch 44/100
371/371 [==============================] - 0s - loss: 0.0390 - val_loss: 0.0335
Epoch 45/100
371/371 [==============================] - 0s - loss: 0.0392 - val_loss: 0.0571
Epoch 46/100
371/371 [==============================] - 0s - loss: 0.0499 - val_loss: 0.0359
Epoch 47/100
371/371 [==============================] - 0s - loss: 0.0386 - val_loss: 0.0463
Epoch 48/100
371/371 [==============================] - 0s - loss: 0.0450 - val_loss: 0.0351
Epoch 49/100
371/371 [==============================] - 0s - loss: 0.0323 - val_loss: 0.0330
Epoch 50/100
371/371 [==============================] - 0s - loss: 0.0418 - val_loss: 0.0436
Epoch 51/100
371/371 [==============================] - 0s - loss: 0.0406 - val_loss: 0.0393
Epoch 52/100
371/371 [==============================] - 0s - loss: 0.0420 - val_loss: 0.0253
Epoch 53/100
371/371 [==============================] - 0s - loss: 0.0343 - val_loss: 0.0408
Epoch 54/100
371/371 [==============================] - 0s - loss: 0.0428 - val_loss: 0.0350
Epoch 55/100
371/371 [==============================] - 0s - loss: 0.0377 - val_loss: 0.0355
Epoch 56/100
371/371 [==============================] - 0s - loss: 0.0378 - val_loss: 0.0314
Epoch 57/100
371/371 [==============================] - 1s - loss: 0.0356 - val_loss: 0.0355
Epoch 58/100
371/371 [==============================] - 0s - loss: 0.0341 - val_loss: 0.0396
Epoch 59/100
371/371 [==============================] - 0s - loss: 0.0397 - val_loss: 0.0243
Epoch 60/100
371/371 [==============================] - 0s - loss: 0.0370 - val_loss: 0.0364
Epoch 61/100
371/371 [==============================] - 0s - loss: 0.0331 - val_loss: 0.0326
Epoch 62/100
371/371 [==============================] - 1s - loss: 0.0351 - val_loss: 0.0249
Epoch 63/100
371/371 [==============================] - 0s - loss: 0.0374 - val_loss: 0.0423
Epoch 64/100
371/371 [==============================] - 1s - loss: 0.0346 - val_loss: 0.0275
Epoch 65/100
371/371 [==============================] - 0s - loss: 0.0305 - val_loss: 0.0262
Epoch 66/100
371/371 [==============================] - 0s - loss: 0.0385 - val_loss: 0.0407
Epoch 67/100
371/371 [==============================] - 0s - loss: 0.0321 - val_loss: 0.0276
Epoch 68/100
371/371 [==============================] - 0s - loss: 0.0333 - val_loss: 0.0232
Epoch 69/100
371/371 [==============================] - 0s - loss: 0.0346 - val_loss: 0.0338
Epoch 70/100
371/371 [==============================] - 0s - loss: 0.0306 - val_loss: 0.0257
Epoch 71/100
371/371 [==============================] - 0s - loss: 0.0312 - val_loss: 0.0325
Epoch 72/100
371/371 [==============================] - 0s - loss: 0.0352 - val_loss: 0.0353
Epoch 73/100
371/371 [==============================] - 0s - loss: 0.0331 - val_loss: 0.0200
Epoch 74/100
371/371 [==============================] - 0s - loss: 0.0281 - val_loss: 0.0374
Epoch 75/100
371/371 [==============================] - 0s - loss: 0.0386 - val_loss: 0.0183
Epoch 76/100
371/371 [==============================] - 0s - loss: 0.0285 - val_loss: 0.0288
Epoch 77/100
371/371 [==============================] - 0s - loss: 0.0285 - val_loss: 0.0188
Epoch 78/100
371/371 [==============================] - 1s - loss: 0.0261 - val_loss: 0.0286
Epoch 79/100
371/371 [==============================] - 1s - loss: 0.0400 - val_loss: 0.0242
Epoch 80/100
371/371 [==============================] - 0s - loss: 0.0288 - val_loss: 0.0219
Epoch 81/100
371/371 [==============================] - 0s - loss: 0.0283 - val_loss: 0.0248
Epoch 82/100
371/371 [==============================] - 0s - loss: 0.0297 - val_loss: 0.0304
Epoch 83/100
371/371 [==============================] - 0s - loss: 0.0295 - val_loss: 0.0246
Epoch 84/100
371/371 [==============================] - 0s - loss: 0.0272 - val_loss: 0.0216
Epoch 85/100
371/371 [==============================] - 0s - loss: 0.0384 - val_loss: 0.0345
Epoch 86/100
371/371 [==============================] - 0s - loss: 0.0277 - val_loss: 0.0214
Epoch 87/100
371/371 [==============================] - 0s - loss: 0.0248 - val_loss: 0.0284
Epoch 88/100
371/371 [==============================] - 0s - loss: 0.0307 - val_loss: 0.0235
Epoch 89/100
371/371 [==============================] - 0s - loss: 0.0269 - val_loss: 0.0243
Epoch 90/100
371/371 [==============================] - 0s - loss: 0.0280 - val_loss: 0.0246
Epoch 91/100
371/371 [==============================] - 0s - loss: 0.0275 - val_loss: 0.0295
Epoch 92/100
371/371 [==============================] - 0s - loss: 0.0297 - val_loss: 0.0354
Epoch 93/100
371/371 [==============================] - 0s - loss: 0.0277 - val_loss: 0.0226
Epoch 94/100
371/371 [==============================] - 0s - loss: 0.0284 - val_loss: 0.0206
Epoch 95/100
371/371 [==============================] - 0s - loss: 0.0239 - val_loss: 0.0225
Epoch 96/100
371/371 [==============================] - 0s - loss: 0.0319 - val_loss: 0.0232
Epoch 97/100
371/371 [==============================] - 0s - loss: 0.0227 - val_loss: 0.0216
Epoch 98/100
371/371 [==============================] - 0s - loss: 0.0262 - val_loss: 0.0230
Epoch 99/100
371/371 [==============================] - 0s - loss: 0.0303 - val_loss: 0.0186
Epoch 100/100
371/371 [==============================] - 0s - loss: 0.0222 - val_loss: 0.0265
Out[15]:
<keras.callbacks.History at 0x7f4490146be0>

In [16]:
model.save_weights(h5Name)

In [17]:
predicted = model.predict(X_test)  
print(np.sqrt(((predicted - y_test) ** 2).mean(axis=0)).mean())  # Printing RMSE


0.161573820608

In [18]:
predicted.shape,  y_test.shape
period = np.random.randint(y_test.shape[0])
market = np.random.randint(y_test.shape[2])
period, market


Out[18]:
((82, 80, 52), (82, 80, 52))
Out[18]:
(46, 43)

In [19]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(predicted[period, :, market], label="predicted")
x=plt.plot(y_test[period, :, market], label="truth")
plt.legend(loc='best')


Out[19]:
[<matplotlib.lines.Line2D at 0x7f441be65eb8>]
Out[19]:
<matplotlib.legend.Legend at 0x7f441be6f9b0>

In [ ]: