Pregunta 1: Predicción sobre serie de tiempo de vuelos en avión.

Se desea predecir el número de pasajeros (en miles) en vuelos internacionales de avión dados registros de pasajeros a través de los años. Los datos fueron recolectados en un rango de 12 años (144 meses).

a) Creación de conjuntos base de entrenamiento y prueba


In [56]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from timeit import default_timer as timer

url = 'http://www.inf.utfsm.cl/~cvalle/international-airline-passengers.csv'
dataframe = pd.read_csv(url, sep=',', usecols=[1], engine='python', skipfooter=3)
dataframe[:] = dataframe[:].astype('float32')

df_train, df_test = dataframe[0:96].values, dataframe[96:].values

scaler = MinMaxScaler(feature_range=(0, 1)).fit(df_train)
stream_train_scaled = scaler.transform(df_train)
stream_test_scaled = scaler.transform(df_test)

In [2]:
dataframe


Out[2]:
International airline passengers: monthly totals in thousands. Jan 49 ? Dec 60
0 112
1 118
2 132
3 129
4 121
5 135
6 148
7 148
8 136
9 119
10 104
11 118
12 115
13 126
14 141
15 135
16 125
17 149
18 170
19 170
20 158
21 133
22 114
23 140
24 145
25 150
26 178
27 163
28 172
29 178
... ...
114 491
115 505
116 404
117 359
118 310
119 337
120 360
121 342
122 406
123 396
124 420
125 472
126 548
127 559
128 463
129 407
130 362
131 405
132 417
133 391
134 419
135 461
136 472
137 535
138 622
139 606
140 508
141 461
142 390
143 432

144 rows × 1 columns

b) Función que crea el dataset

El dataset es generado de la siguiente forma, dados un vector de datos $X_{inicial} = \{x_1, \ldots, x_I\}$ se generan $L$ vectores $X_l$ de tamaño $I-L$ que corresponden a los inputs (configuran una matriz $X$ de $I-L \times L)$ y un vector de targets $Y$ de largo $I-L$. Los inputs están relacionados de los targets de la forma $$[ X = (X_1, \ldots X_L), Y ]$$


In [78]:
def create_dataset(dataset, lag=1):
    dataX = np.zeros((dataset.shape[0]-lag, lag), dtype=np.float32)
    for i in range(lag):
        dataX[:,i] = dataset[i:-lag+i][:,0]
    dataY = dataset[lag:]
    return dataX, dataY

c) Crear dataset con lag = 3


In [84]:
lag = 3
trainX, TrainY = create_dataset(stream_train_scaled, lag)
testX, TestY = create_dataset(stream_test_scaled, lag)

d) Modificar forma de los datos para ingresarlas a la LSTM


In [86]:
TrainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
TestX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

e) Entrenamiento de LSTM con lag = 3


In [52]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

In [72]:
model = Sequential()
model.add(LSTM(output_dim=4, input_dim=lag, activation='tanh', inner_activation='sigmoid'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
start = timer()
model.fit(TrainX, TrainY, nb_epoch=100, batch_size=1, verbose=2)
end = timer()
model.save("LSTM_lag3.h5")
print "Elapsed training time: %s sec"%(end-start)


Epoch 1/100
0s - loss: 0.1456
Epoch 2/100
0s - loss: 0.0776
Epoch 3/100
0s - loss: 0.0444
Epoch 4/100
0s - loss: 0.0313
Epoch 5/100
0s - loss: 0.0259
Epoch 6/100
0s - loss: 0.0221
Epoch 7/100
0s - loss: 0.0190
Epoch 8/100
0s - loss: 0.0163
Epoch 9/100
0s - loss: 0.0140
Epoch 10/100
0s - loss: 0.0121
Epoch 11/100
0s - loss: 0.0107
Epoch 12/100
0s - loss: 0.0096
Epoch 13/100
0s - loss: 0.0089
Epoch 14/100
0s - loss: 0.0083
Epoch 15/100
0s - loss: 0.0080
Epoch 16/100
0s - loss: 0.0077
Epoch 17/100
0s - loss: 0.0074
Epoch 18/100
0s - loss: 0.0072
Epoch 19/100
0s - loss: 0.0071
Epoch 20/100
0s - loss: 0.0070
Epoch 21/100
0s - loss: 0.0070
Epoch 22/100
0s - loss: 0.0069
Epoch 23/100
0s - loss: 0.0068
Epoch 24/100
0s - loss: 0.0067
Epoch 25/100
0s - loss: 0.0068
Epoch 26/100
0s - loss: 0.0066
Epoch 27/100
0s - loss: 0.0065
Epoch 28/100
0s - loss: 0.0066
Epoch 29/100
0s - loss: 0.0066
Epoch 30/100
0s - loss: 0.0062
Epoch 31/100
0s - loss: 0.0063
Epoch 32/100
0s - loss: 0.0066
Epoch 33/100
0s - loss: 0.0062
Epoch 34/100
0s - loss: 0.0062
Epoch 35/100
0s - loss: 0.0062
Epoch 36/100
0s - loss: 0.0063
Epoch 37/100
0s - loss: 0.0062
Epoch 38/100
0s - loss: 0.0061
Epoch 39/100
0s - loss: 0.0063
Epoch 40/100
0s - loss: 0.0061
Epoch 41/100
0s - loss: 0.0060
Epoch 42/100
0s - loss: 0.0060
Epoch 43/100
0s - loss: 0.0060
Epoch 44/100
0s - loss: 0.0061
Epoch 45/100
0s - loss: 0.0060
Epoch 46/100
0s - loss: 0.0059
Epoch 47/100
0s - loss: 0.0059
Epoch 48/100
0s - loss: 0.0058
Epoch 49/100
0s - loss: 0.0058
Epoch 50/100
0s - loss: 0.0058
Epoch 51/100
0s - loss: 0.0058
Epoch 52/100
0s - loss: 0.0060
Epoch 53/100
0s - loss: 0.0058
Epoch 54/100
0s - loss: 0.0057
Epoch 55/100
0s - loss: 0.0058
Epoch 56/100
0s - loss: 0.0057
Epoch 57/100
0s - loss: 0.0055
Epoch 58/100
0s - loss: 0.0061
Epoch 59/100
0s - loss: 0.0057
Epoch 60/100
0s - loss: 0.0057
Epoch 61/100
0s - loss: 0.0058
Epoch 62/100
0s - loss: 0.0056
Epoch 63/100
0s - loss: 0.0056
Epoch 64/100
0s - loss: 0.0058
Epoch 65/100
0s - loss: 0.0056
Epoch 66/100
0s - loss: 0.0057
Epoch 67/100
0s - loss: 0.0057
Epoch 68/100
0s - loss: 0.0055
Epoch 69/100
0s - loss: 0.0055
Epoch 70/100
0s - loss: 0.0057
Epoch 71/100
0s - loss: 0.0057
Epoch 72/100
0s - loss: 0.0054
Epoch 73/100
0s - loss: 0.0055
Epoch 74/100
0s - loss: 0.0054
Epoch 75/100
0s - loss: 0.0056
Epoch 76/100
0s - loss: 0.0054
Epoch 77/100
0s - loss: 0.0054
Epoch 78/100
0s - loss: 0.0055
Epoch 79/100
0s - loss: 0.0054
Epoch 80/100
0s - loss: 0.0054
Epoch 81/100
0s - loss: 0.0055
Epoch 82/100
0s - loss: 0.0053
Epoch 83/100
0s - loss: 0.0052
Epoch 84/100
0s - loss: 0.0055
Epoch 85/100
0s - loss: 0.0053
Epoch 86/100
0s - loss: 0.0054
Epoch 87/100
0s - loss: 0.0053
Epoch 88/100
0s - loss: 0.0054
Epoch 89/100
0s - loss: 0.0052
Epoch 90/100
0s - loss: 0.0052
Epoch 91/100
0s - loss: 0.0054
Epoch 92/100
0s - loss: 0.0053
Epoch 93/100
0s - loss: 0.0052
Epoch 94/100
0s - loss: 0.0054
Epoch 95/100
0s - loss: 0.0052
Epoch 96/100
0s - loss: 0.0051
Epoch 97/100
0s - loss: 0.0053
Epoch 98/100
0s - loss: 0.0052
Epoch 99/100
0s - loss: 0.0055
Epoch 100/100
0s - loss: 0.0051
Elapsed training time: 25.5540499687 sec

f) Predicciones para conjuntos de prueba y entrenamiento

Es importante desnormalizar para luego computar errores contra los datos originales.


In [73]:
from keras.models import load_model

model = load_model("LSTM_lag3.h5")
trainPredict = model.predict(TrainX)
trainPredict = scaler.inverse_transform(trainPredict)
trainY_noscale = scaler.inverse_transform(TrainY)

testPredict = model.predict(TestX)
testPredict = scaler.inverse_transform(testPredict)
testY_noscale = scaler.inverse_transform(TestY)

g) Cálculo de error en conjuntos de prueba y entrenamiento

Se puede interpretar este error como cantidad de pasajeros que se estan olvidando o sobreestimando.


In [74]:
import math
from sklearn.metrics import mean_squared_error
# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY_noscale[:,0], trainPredict[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY_noscale[:,0], testPredict[:,0]))
print('Test Score: %.2f RMSE' % (testScore))


Train Score: 22.01 RMSE
Test Score: 62.48 RMSE

h) Gráficos de predicciones


In [10]:
%matplotlib inline
import matplotlib.pyplot as plt

def plot(data_init, data_predicted, lag, title, train_or_test):
    """
        Plot original targets and predicted targets
    """
    months = np.arange(0,144,1)
    plt.figure(figsize=(15,5))
    if train_or_test == "train":
        ticks = np.arange(0, 102, 6)
        plt.xlim(0, 96)
    elif train_or_test == "test":
        ticks = np.arange(96, 150, 6)
        plt.xlim(96, 144)
    plt.title(title, fontsize=16)
    plt.plot(months, data_init, "b-", lw=1.5, label="Targets")
    plt.plot(months, data_predicted, "r-", lw=1.5, label=u"Predicción")
    plt.xlabel(r"$t$ (meses)", fontsize=16)
    plt.xticks(ticks)
    plt.ylabel(r"Personas", fontsize=16)
    plt.grid()
    plt.legend(loc='best')
    plt.show()
    
def plot_vs_series(series, train_predicted, test_predicted, title):
    months = np.arange(0,144,1)
    ticks = np.arange(0, 150, 6)
    plt.figure(figsize=(15,5))
    plt.title(title, fontsize=16)
    plt.plot(months, series[:,0], 'g-', lw=1.0, label=u"Serie original")
    plt.plot(months, train_predicted, 'b-', lw=1.5, label=u"Predicción train")
    plt.plot(months, test_predicted, 'r-', lw=1.5, label=u"Predicción test")
    plt.xlabel(r"$t$ (meses)", fontsize=16)
    plt.xticks(ticks)
    plt.xlim(0, 144)
    plt.ylabel(r"Personas", fontsize=16)
    plt.grid()
    plt.legend(loc='best')
    plt.show()

In [11]:
# shift train predictions for plotting
trainPredictPlot = np.empty_like(dataframe.values)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[lag:len(trainPredict)+lag, :] = trainPredict

# shift train original targets for plotting
trainY_asoriginal = np.empty_like(dataframe.values)
trainY_asoriginal[:, :] = np.nan
trainY_asoriginal[lag:len(trainY_noscale)+lag, :] = trainY_noscale

plot(trainY_asoriginal, trainPredictPlot, lag, "Resultados conjunto de entrenamiento", "train")

# shift test predictions for plotting
testPredictPlot = np.empty_like(dataframe.values)
testPredictPlot[:, :] = np.nan
testPredictPlot[(len(trainPredict)+2*lag):, :] = testPredict

# shift test original targets for plotting
testY_asoriginal = np.empty_like(dataframe.values)
testY_asoriginal[:, :] = np.nan
testY_asoriginal[(len(trainY_noscale)+2*lag):, :] = testY_noscale

plot(testY_asoriginal, testPredictPlot, lag, "Resultados conjunto de pruebas", "test")

plot_vs_series(dataframe.values, trainPredictPlot, testPredictPlot, "Resultados sobre toda la serie original")


i) Determinar el número de bloques LSTM

Para ello se utilizará el método k-fold Cross Validation con k=5.


In [ ]:
nb = range(4,13,2)
k = 5
kf_CV = KFold(TrainY[:,0].shape[0], k, shuffle=True)

results = []
for n in nb:
    print "Usando",n,"bloques LSTM"
    losses = []
    for i, (train, test) in enumerate(kf_CV):
        print "Analizando fold", i+1, "/", k
        model = None
        model = Sequential()
        model.add(LSTM(output_dim=n, input_dim=lag, activation='tanh', inner_activation='sigmoid'))
        model.add(Dense(1))
        model.compile(loss='mean_squared_error', optimizer='adam')
        model.fit(TrainX[train], TrainY[train], nb_epoch=100, batch_size=1, verbose=0)
        loss = model.evaluate(TrainX[test], TrainY[test])
        losses.append(loss)
    results.append(losses)
    print losses
print "Resultados finales"
print results

In [49]:
f = open("pregunta1_h.txt")
print f.read()
f.close()


Usando 4 bloques LSTM
Analizando fold 1 / 5
19/19 [==============================] - 0s
Analizando fold 2 / 5
19/19 [==============================] - 0s
Analizando fold 3 / 5
19/19 [==============================] - 0s
Analizando fold 4 / 5
18/18 [==============================] - 0s
Analizando fold 5 / 5
18/18 [==============================] - 0s
[0.0058013456873595715, 0.0067546665668487549, 0.0039709298871457577, 0.0077248862944543362, 0.0068293004296720028]
Usando 6 bloques LSTM
Analizando fold 1 / 5
19/19 [==============================] - 0s
Analizando fold 2 / 5
19/19 [==============================] - 0s
Analizando fold 3 / 5
19/19 [==============================] - 0s
Analizando fold 4 / 5
18/18 [==============================] - 0s
Analizando fold 5 / 5
18/18 [==============================] - 0s
[0.0079037416726350784, 0.0063770841807126999, 0.0039345691911876202, 0.0053643947467207909, 0.0053532104939222336]
Usando 8 bloques LSTM
Analizando fold 1 / 5
19/19 [==============================] - 0s
Analizando fold 2 / 5
19/19 [==============================] - 0s
Analizando fold 3 / 5
19/19 [==============================] - 0s
Analizando fold 4 / 5
18/18 [==============================] - 0s
Analizando fold 5 / 5
18/18 [==============================] - 0s
[0.0060981051065027714, 0.0064719794318079948, 0.0044479849748313427, 0.0044790026731789112, 0.0056187780573964119]
Usando 10 bloques LSTM
Analizando fold 1 / 5
19/19 [==============================] - 0s
Analizando fold 2 / 5
19/19 [==============================] - 0s
Analizando fold 3 / 5
19/19 [==============================] - 0s
Analizando fold 4 / 5
18/18 [==============================] - 0s
Analizando fold 5 / 5
18/18 [==============================] - 0s
[0.0071236342191696167, 0.0046910210512578487, 0.0047823912464082241, 0.0059864339418709278, 0.0055963038466870785]
Usando 12 bloques LSTM
Analizando fold 1 / 5
19/19 [==============================] - 0s
Analizando fold 2 / 5
19/19 [==============================] - 0s
Analizando fold 3 / 5
19/19 [==============================] - 0s
Analizando fold 4 / 5
18/18 [==============================] - 0s
Analizando fold 5 / 5
18/18 [==============================] - 0s
[0.0073507814668118954, 0.0063614812679588795, 0.0038768104277551174, 0.0048401798121631145, 0.0056704352609813213]
Resultados finales
[[0.0058013456873595715, 0.0067546665668487549, 0.0039709298871457577, 0.0077248862944543362, 0.0068293004296720028], [0.0079037416726350784, 0.0063770841807126999, 0.0039345691911876202, 0.0053643947467207909, 0.0053532104939222336], [0.0060981051065027714, 0.0064719794318079948, 0.0044479849748313427, 0.0044790026731789112, 0.0056187780573964119], [0.0071236342191696167, 0.0046910210512578487, 0.0047823912464082241, 0.0059864339418709278, 0.0055963038466870785], [0.0073507814668118954, 0.0063614812679588795, 0.0038768104277551174, 0.0048401798121631145, 0.0056704352609813213]]


In [47]:
err_4b = np.mean([0.0058013456873595715, 0.0067546665668487549, 
                  0.0039709298871457577, 0.0077248862944543362, 0.0068293004296720028])
err_6b = np.mean([0.0079037416726350784, 0.0063770841807126999, 0.0039345691911876202,
                  0.0053643947467207909, 0.0053532104939222336])
err_8b = np.mean([0.0060981051065027714, 0.0064719794318079948,
                  0.0044479849748313427, 0.0044790026731789112, 0.0056187780573964119])
err_10b = np.mean([0.0071236342191696167, 0.0046910210512578487,
                   0.0047823912464082241, 0.0059864339418709278, 0.0055963038466870785])
err_12b = np.mean([0.0073507814668118954, 0.0063614812679588795,
                   0.0038768104277551174, 0.0048401798121631145, 0.0056704352609813213])
print "Error con 4 bloques:",err_4b
print "Error con 6 bloques:",err_6b
print "Error con 8 bloques:",err_8b
print "Error con 10 bloques:",err_10b
print "Error con 12 bloques:",err_12b


Error con 4 bloques: 0.0062162257731
Error con 6 bloques: 0.00578660005704
Error con 8 bloques: 0.00542317004874
Error con 10 bloques: 0.00563595686108
Error con 12 bloques: 0.00561993764713

Los resultados de k-fold Cross Validation muestran que obtenemos mejores resultados con 8 bloques LSTM.

j) Variación del lag entre 1 a 4, considerando LSTM de 8 bloques


In [8]:
import math
from sklearn.metrics import mean_squared_error

lags = [1, 2, 3, 4]
for lag in lags:
    trainX, TrainY = create_dataset(stream_train_scaled, lag)
    testX, TestY = create_dataset(stream_test_scaled, lag)
    
    TrainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
    TestX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

    model = Sequential()
    model.add(LSTM(output_dim=8, input_dim=lag, activation='tanh', inner_activation='sigmoid'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    model.fit(TrainX, TrainY, nb_epoch=100, batch_size=1, verbose=0)
    loss = model.evaluate(TestX, TestY, verbose=0)
    print "Loss para lag",lag,":",loss
    
    trainPredict = model.predict(TrainX)
    trainPredict = scaler.inverse_transform(trainPredict)
    trainY_noscale = scaler.inverse_transform(TrainY)
    trainScore = math.sqrt(mean_squared_error(trainY_noscale[:,0], trainPredict[:,0]))
    print('Train Score: %.2f RMSE' % (trainScore))
    
    testPredict = model.predict(TestX)
    testPredict = scaler.inverse_transform(testPredict)
    testY_noscale = scaler.inverse_transform(TestY)
    testScore = math.sqrt(mean_squared_error(testY_noscale[:,0], testPredict[:,0]))
    print('Test Score: %.2f RMSE' % (testScore))
    
    del model


Loss para lag 1 : 0.0276761328445
Train Score: 23.13 RMSE
Test Score: 51.41 RMSE
Loss para lag 2 : 0.0418205134895
Train Score: 21.88 RMSE
Test Score: 63.19 RMSE
Loss para lag 3 : 0.0423788474666
Train Score: 21.02 RMSE
Test Score: 63.61 RMSE
Loss para lag 4 : 0.0543724332343
Train Score: 21.91 RMSE
Test Score: 72.05 RMSE

Es posible ver que valores de lag bajos dan menor error en el conjunto de pruebas, en especial con lag = 1. Respecto al error cuadrático medio de los datos reconstruidos los menores valores en el conjunto de pruebas se encuentran con lag = 1. No parece haber una diferencia significativa entre utilizar lag = 2 y lag = 3

k) Comparación de LSTM versus RNN y GRU


In [24]:
from keras.layers import GRU
from keras.layers import SimpleRNN

lag = 3
trainX, TrainY = create_dataset(stream_train_scaled, lag)
testX, TestY = create_dataset(stream_test_scaled, lag)
   
TrainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
TestX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

for i in range(10):
    model = Sequential()
    model.add(GRU(output_dim=8, input_dim=lag, inner_init='orthogonal', activation='tanh'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    model.fit(TrainX, TrainY, nb_epoch=100, batch_size=1, verbose=0)
    model.save("GRU_lag3_exec_"+str(i)+".h5")

In [25]:
for i in range(10):
    model = Sequential()
    model.add(SimpleRNN(output_dim=8, input_dim=lag, inner_init='orthogonal',activation='tanh'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    model.fit(TrainX, TrainY, nb_epoch=100, batch_size=1, verbose=0)
    model.save("RNN_lag3_exec_"+str(i)+".h5")

In [27]:
for i in range(10):
    model = Sequential()
    model.add(LSTM(output_dim=8, input_dim=3, activation='tanh', inner_activation='sigmoid'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    model.fit(TrainX, TrainY, nb_epoch=100, batch_size=1, verbose=0)
    model.save("LSTM_lag3_8block_exec_"+str(i)+".h5")

In [28]:
from keras.models import load_model

# Load GRU results
gru_data = {}
gru_data['trainPredict'] = []
gru_data['trainY_noscale'] = []
gru_data['testPredict'] = []
gru_data['testY_noscale'] = []

for i in range(10):
    model = load_model("GRU_lag3_exec_"+str(i)+".h5")
    gru_data['trainPredict'].append(scaler.inverse_transform(model.predict(TrainX)))
    gru_data['trainY_noscale'].append(scaler.inverse_transform(TrainY))

    gru_data['testPredict'].append(scaler.inverse_transform(model.predict(TestX)))
    gru_data['testY_noscale'].append(scaler.inverse_transform(TestY))
    
    del model

# Load RNN results
rnn_data = {}
rnn_data['trainPredict'] = []
rnn_data['trainY_noscale'] = []
rnn_data['testPredict'] = []
rnn_data['testY_noscale'] = []

for i in range(10):
    model = load_model("RNN_lag3_exec_"+str(i)+".h5")
    rnn_data['trainPredict'].append(scaler.inverse_transform(model.predict(TrainX)))
    rnn_data['trainY_noscale'].append(scaler.inverse_transform(TrainY))
    
    rnn_data['testPredict'].append(scaler.inverse_transform(model.predict(TestX)))
    rnn_data['testY_noscale'].append(scaler.inverse_transform(TestY))
    
    del model

# Load  results
lstm_data = {}
lstm_data['trainPredict'] = []
lstm_data['trainY_noscale'] = []
lstm_data['testPredict'] = []
lstm_data['testY_noscale'] = []

for i in range(10):
    model = load_model("LSTM_lag3_8block_exec_"+str(i)+".h5")
    lstm_data['trainPredict'].append(scaler.inverse_transform(model.predict(TrainX)))
    lstm_data['trainY_noscale'].append(scaler.inverse_transform(TrainY))

    lstm_data['testPredict'].append(scaler.inverse_transform(model.predict(TestX)))
    lstm_data['testY_noscale'].append(scaler.inverse_transform(TestY))
    
    del model

In [40]:
import math
from sklearn.metrics import mean_squared_error

# Promediar salidas de 10 experimentos por tipo de red
gru_train_scores = []
gru_test_scores = []
for y_pred, y_data in zip(gru_data['trainPredict'], gru_data['trainY_noscale']):
    gru_train_scores.append(math.sqrt(mean_squared_error(y_data[:,0], y_pred[:,0])))
for y_pred, y_data in zip(gru_data['testPredict'], gru_data['testY_noscale']):
    gru_test_scores.append(math.sqrt(mean_squared_error(y_data[:,0], y_pred[:,0])))
trainScore = np.mean(gru_train_scores)
testScore = np.mean(gru_test_scores)
print('GRU Train Score: %.2f RMSE' % (trainScore))
print('GRU Test Score: %.2f RMSE' % (testScore))

rnn_train_scores = []
rnn_test_scores = []
for y_pred, y_data in zip(rnn_data['trainPredict'], rnn_data['trainY_noscale']):
    rnn_train_scores.append(math.sqrt(mean_squared_error(y_data[:,0], y_pred[:,0])))
for y_pred, y_data in zip(rnn_data['testPredict'], rnn_data['testY_noscale']):
    rnn_test_scores.append(math.sqrt(mean_squared_error(y_data[:,0], y_pred[:,0])))
trainScore = np.mean(rnn_train_scores)
testScore = np.mean(rnn_test_scores)
print('RNN Train Score: %.2f RMSE' % (trainScore))
print('RNN Test Score: %.2f RMSE' % (testScore))

lstm_train_scores = []
lstm_test_scores = []
for y_pred, y_data in zip(lstm_data['trainPredict'], lstm_data['trainY_noscale']):
    lstm_train_scores.append(math.sqrt(mean_squared_error(y_data[:,0], y_pred[:,0])))
for y_pred, y_data in zip(lstm_data['testPredict'], lstm_data['testY_noscale']):
    lstm_test_scores.append(math.sqrt(mean_squared_error(y_data[:,0], y_pred[:,0])))
trainScore = np.mean(lstm_train_scores)
testScore = np.mean(lstm_test_scores)
print('LSTM Train Score: %.2f RMSE' % (trainScore))
print('LSTM Test Score: %.2f RMSE' % (testScore))


GRU Train Score: 21.41 RMSE
GRU Test Score: 63.61 RMSE
RNN Train Score: 22.15 RMSE
RNN Test Score: 64.13 RMSE
LSTM Train Score: 21.51 RMSE
LSTM Test Score: 65.88 RMSE

Es posible apreciar que no hay mucha diferencia entre los resultados de las máquinas. Computacionalmente es más rapido entrenar GRU y RNN que una LSTM. Respecto de los experimentos, la GRU en promedio posee mejor RMSE que las otras dos redes.

l) Entrenamiento de LSTM original con timestep 3

La idea del uso de timestep es interpretar los datos de manera distinta. En la versión original, formateamos el dataset de forma tal que asumieramos un timestep por cada ejemplo. Puede ser que en la realidad por ejemplo exista más de un timestep. Modificamos el código para reflejar este cambio-


In [172]:
TrainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))
model = Sequential()
model.add(LSTM(8, input_dim=1, activation='tanh', inner_activation='sigmoid'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
start = timer()
model.fit(TrainX, TrainY, nb_epoch=100, batch_size=1, verbose=2)
end = timer()
model.save("LSTM_lag3_timestep3.h5")
print "Elapsed training time: %s"%(end - start)


Epoch 1/100
0s - loss: 0.1344
Epoch 2/100
0s - loss: 0.0417
Epoch 3/100
0s - loss: 0.0297
Epoch 4/100
0s - loss: 0.0236
Epoch 5/100
0s - loss: 0.0190
Epoch 6/100
0s - loss: 0.0162
Epoch 7/100
0s - loss: 0.0143
Epoch 8/100
0s - loss: 0.0131
Epoch 9/100
0s - loss: 0.0133
Epoch 10/100
0s - loss: 0.0123
Epoch 11/100
0s - loss: 0.0122
Epoch 12/100
0s - loss: 0.0117
Epoch 13/100
0s - loss: 0.0117
Epoch 14/100
0s - loss: 0.0117
Epoch 15/100
0s - loss: 0.0115
Epoch 16/100
0s - loss: 0.0114
Epoch 17/100
0s - loss: 0.0112
Epoch 18/100
0s - loss: 0.0112
Epoch 19/100
0s - loss: 0.0112
Epoch 20/100
0s - loss: 0.0110
Epoch 21/100
0s - loss: 0.0106
Epoch 22/100
0s - loss: 0.0115
Epoch 23/100
0s - loss: 0.0110
Epoch 24/100
0s - loss: 0.0106
Epoch 25/100
0s - loss: 0.0108
Epoch 26/100
0s - loss: 0.0107
Epoch 27/100
0s - loss: 0.0104
Epoch 28/100
0s - loss: 0.0107
Epoch 29/100
0s - loss: 0.0104
Epoch 30/100
0s - loss: 0.0107
Epoch 31/100
0s - loss: 0.0104
Epoch 32/100
0s - loss: 0.0108
Epoch 33/100
0s - loss: 0.0103
Epoch 34/100
0s - loss: 0.0096
Epoch 35/100
0s - loss: 0.0100
Epoch 36/100
0s - loss: 0.0098
Epoch 37/100
0s - loss: 0.0096
Epoch 38/100
0s - loss: 0.0095
Epoch 39/100
0s - loss: 0.0097
Epoch 40/100
0s - loss: 0.0096
Epoch 41/100
0s - loss: 0.0093
Epoch 42/100
0s - loss: 0.0097
Epoch 43/100
0s - loss: 0.0093
Epoch 44/100
0s - loss: 0.0090
Epoch 45/100
0s - loss: 0.0089
Epoch 46/100
0s - loss: 0.0089
Epoch 47/100
0s - loss: 0.0088
Epoch 48/100
0s - loss: 0.0093
Epoch 49/100
0s - loss: 0.0088
Epoch 50/100
0s - loss: 0.0085
Epoch 51/100
0s - loss: 0.0084
Epoch 52/100
0s - loss: 0.0082
Epoch 53/100
0s - loss: 0.0083
Epoch 54/100
0s - loss: 0.0082
Epoch 55/100
0s - loss: 0.0081
Epoch 56/100
0s - loss: 0.0079
Epoch 57/100
0s - loss: 0.0080
Epoch 58/100
0s - loss: 0.0076
Epoch 59/100
0s - loss: 0.0080
Epoch 60/100
0s - loss: 0.0076
Epoch 61/100
0s - loss: 0.0069
Epoch 62/100
0s - loss: 0.0077
Epoch 63/100
0s - loss: 0.0073
Epoch 64/100
0s - loss: 0.0070
Epoch 65/100
0s - loss: 0.0073
Epoch 66/100
0s - loss: 0.0069
Epoch 67/100
0s - loss: 0.0072
Epoch 68/100
0s - loss: 0.0067
Epoch 69/100
0s - loss: 0.0067
Epoch 70/100
0s - loss: 0.0069
Epoch 71/100
0s - loss: 0.0064
Epoch 72/100
0s - loss: 0.0064
Epoch 73/100
0s - loss: 0.0064
Epoch 74/100
0s - loss: 0.0061
Epoch 75/100
0s - loss: 0.0060
Epoch 76/100
0s - loss: 0.0060
Epoch 77/100
0s - loss: 0.0064
Epoch 78/100
0s - loss: 0.0059
Epoch 79/100
0s - loss: 0.0060
Epoch 80/100
0s - loss: 0.0057
Epoch 81/100
0s - loss: 0.0058
Epoch 82/100
0s - loss: 0.0057
Epoch 83/100
0s - loss: 0.0055
Epoch 84/100
0s - loss: 0.0058
Epoch 85/100
0s - loss: 0.0059
Epoch 86/100
0s - loss: 0.0056
Epoch 87/100
0s - loss: 0.0058
Epoch 88/100
0s - loss: 0.0056
Epoch 89/100
0s - loss: 0.0052
Epoch 90/100
0s - loss: 0.0054
Epoch 91/100
0s - loss: 0.0054
Epoch 92/100
0s - loss: 0.0056
Epoch 93/100
0s - loss: 0.0053
Epoch 94/100
0s - loss: 0.0052
Epoch 95/100
0s - loss: 0.0053
Epoch 96/100
0s - loss: 0.0054
Epoch 97/100
0s - loss: 0.0053
Epoch 98/100
0s - loss: 0.0053
Epoch 99/100
0s - loss: 0.0052
Epoch 100/100
0s - loss: 0.0051
Elapsed training time: 30.5118508339

In [173]:
model = load_model("LSTM_lag3_timestep3.h5")
TestX = np.reshape(testX, (testX.shape[0], testX.shape[1], 1))
trainPredict = model.predict(TrainX)
trainPredict = scaler.inverse_transform(trainPredict)
trainY_noscale = scaler.inverse_transform(TrainY)

testPredict = model.predict(TestX)
testPredict = scaler.inverse_transform(testPredict)
testY_noscale = scaler.inverse_transform(TestY)

In [174]:
# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY_noscale[:,0], trainPredict[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY_noscale[:,0], testPredict[:,0]))
print('Test Score: %.2f RMSE' % (testScore))


Train Score: 21.68 RMSE
Test Score: 60.10 RMSE

In [175]:
# shift train predictions for plotting
trainPredictPlot = np.empty_like(dataframe.values)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[lag:len(trainPredict)+lag, :] = trainPredict

# shift train original targets for plotting
trainY_asoriginal = np.empty_like(dataframe.values)
trainY_asoriginal[:, :] = np.nan
trainY_asoriginal[lag:len(trainY_noscale)+lag, :] = trainY_noscale

plot(trainY_asoriginal, trainPredictPlot, lag, "Resultados conjunto de entrenamiento", "train")

# shift test predictions for plotting
testPredictPlot = np.empty_like(dataframe.values)
testPredictPlot[:, :] = np.nan
testPredictPlot[(len(trainPredict)+2*lag):, :] = testPredict

# shift test original targets for plotting
testY_asoriginal = np.empty_like(dataframe.values)
testY_asoriginal[:, :] = np.nan
testY_asoriginal[(len(trainY_noscale)+2*lag):, :] = testY_noscale

plot(testY_asoriginal, testPredictPlot, lag, "Resultados conjunto de pruebas", "test")

plot_vs_series(dataframe.values, trainPredictPlot, testPredictPlot, "Resultados sobre toda la serie original")


Los resultados son similares a la primera red encontrada, mejorando un poco el RMSE.

Respecto a los tiempos de ejecución, estos son comparables, aproximadamente de 30 segundos por red.

m) Entrenamiento de LSTM con memoria entre batches

Normalmente tras cada batch reiniciamos la memoria de la LSTM. Ahora podemos decidir el reset de memoria luego de un número determinado de batches.


In [176]:
TrainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))
TestX = np.reshape(testX, (testX.shape[0], testX.shape[1], 1))

In [177]:
lag = 3
batch_size = 1
model = Sequential()
model.add(LSTM(8, batch_input_shape=(batch_size, lag, 1), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(100):
    model.fit(TrainX, TrainY, nb_epoch=1, batch_size=batch_size, verbose=2, shuffle=False)
    model.reset_states()
model.save("LSTM_lag3_batch1.h5")


Epoch 1/1
0s - loss: 0.0162
Epoch 1/1
0s - loss: 0.0254
Epoch 1/1
0s - loss: 0.0145
Epoch 1/1
0s - loss: 0.0113
Epoch 1/1
0s - loss: 0.0099
Epoch 1/1
0s - loss: 0.0094
Epoch 1/1
0s - loss: 0.0092
Epoch 1/1
0s - loss: 0.0090
Epoch 1/1
0s - loss: 0.0089
Epoch 1/1
0s - loss: 0.0088
Epoch 1/1
0s - loss: 0.0086
Epoch 1/1
0s - loss: 0.0086
Epoch 1/1
0s - loss: 0.0085
Epoch 1/1
0s - loss: 0.0086
Epoch 1/1
0s - loss: 0.0087
Epoch 1/1
0s - loss: 0.0089
Epoch 1/1
0s - loss: 0.0090
Epoch 1/1
0s - loss: 0.0090
Epoch 1/1
0s - loss: 0.0089
Epoch 1/1
0s - loss: 0.0087
Epoch 1/1
0s - loss: 0.0085
Epoch 1/1
0s - loss: 0.0084
Epoch 1/1
0s - loss: 0.0082
Epoch 1/1
0s - loss: 0.0081
Epoch 1/1
0s - loss: 0.0080
Epoch 1/1
0s - loss: 0.0086
Epoch 1/1
0s - loss: 0.0122
Epoch 1/1
0s - loss: 0.0075
Epoch 1/1
0s - loss: 0.0068
Epoch 1/1
0s - loss: 0.0067
Epoch 1/1
0s - loss: 0.0067
Epoch 1/1
0s - loss: 0.0067
Epoch 1/1
0s - loss: 0.0068
Epoch 1/1
0s - loss: 0.0068
Epoch 1/1
0s - loss: 0.0069
Epoch 1/1
0s - loss: 0.0070
Epoch 1/1
0s - loss: 0.0070
Epoch 1/1
0s - loss: 0.0070
Epoch 1/1
0s - loss: 0.0069
Epoch 1/1
0s - loss: 0.0068
Epoch 1/1
0s - loss: 0.0067
Epoch 1/1
0s - loss: 0.0066
Epoch 1/1
0s - loss: 0.0065
Epoch 1/1
0s - loss: 0.0064
Epoch 1/1
0s - loss: 0.0063
Epoch 1/1
0s - loss: 0.0063
Epoch 1/1
0s - loss: 0.0062
Epoch 1/1
0s - loss: 0.0062
Epoch 1/1
0s - loss: 0.0062
Epoch 1/1
0s - loss: 0.0061
Epoch 1/1
0s - loss: 0.0061
Epoch 1/1
0s - loss: 0.0060
Epoch 1/1
0s - loss: 0.0060
Epoch 1/1
0s - loss: 0.0059
Epoch 1/1
0s - loss: 0.0059
Epoch 1/1
0s - loss: 0.0058
Epoch 1/1
0s - loss: 0.0058
Epoch 1/1
0s - loss: 0.0057
Epoch 1/1
0s - loss: 0.0058
Epoch 1/1
0s - loss: 0.0057
Epoch 1/1
0s - loss: 0.0062
Epoch 1/1
0s - loss: 0.0058
Epoch 1/1
0s - loss: 0.0058
Epoch 1/1
0s - loss: 0.0057
Epoch 1/1
0s - loss: 0.0058
Epoch 1/1
0s - loss: 0.0057
Epoch 1/1
0s - loss: 0.0059
Epoch 1/1
0s - loss: 0.0057
Epoch 1/1
0s - loss: 0.0060
Epoch 1/1
0s - loss: 0.0057
Epoch 1/1
0s - loss: 0.0058
Epoch 1/1
0s - loss: 0.0056
Epoch 1/1
0s - loss: 0.0055
Epoch 1/1
0s - loss: 0.0054
Epoch 1/1
0s - loss: 0.0053
Epoch 1/1
0s - loss: 0.0051
Epoch 1/1
0s - loss: 0.0050
Epoch 1/1
0s - loss: 0.0048
Epoch 1/1
0s - loss: 0.0047
Epoch 1/1
0s - loss: 0.0046
Epoch 1/1
0s - loss: 0.0044
Epoch 1/1
0s - loss: 0.0043
Epoch 1/1
0s - loss: 0.0042
Epoch 1/1
0s - loss: 0.0042
Epoch 1/1
0s - loss: 0.0041
Epoch 1/1
0s - loss: 0.0040
Epoch 1/1
0s - loss: 0.0040
Epoch 1/1
0s - loss: 0.0039
Epoch 1/1
0s - loss: 0.0039
Epoch 1/1
0s - loss: 0.0039
Epoch 1/1
0s - loss: 0.0038
Epoch 1/1
0s - loss: 0.0038
Epoch 1/1
0s - loss: 0.0038
Epoch 1/1
0s - loss: 0.0038
Epoch 1/1
0s - loss: 0.0038
Epoch 1/1
0s - loss: 0.0038
Epoch 1/1
0s - loss: 0.0039
Epoch 1/1
0s - loss: 0.0047
Epoch 1/1
0s - loss: 0.0042
Epoch 1/1
0s - loss: 0.0042

In [178]:
model = load_model("LSTM_lag3_batch1.h5")
trainPredict = model.predict(TrainX, batch_size=batch_size)
trainPredict = scaler.inverse_transform(trainPredict)
trainY_noscale = scaler.inverse_transform(TrainY)

testPredict = model.predict(TestX, batch_size=batch_size)
testPredict = scaler.inverse_transform(testPredict)
testY_noscale = scaler.inverse_transform(TestY)

In [179]:
# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY_noscale[:,0], trainPredict[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY_noscale[:,0], testPredict[:,0]))
print('Test Score: %.2f RMSE' % (testScore))


Train Score: 19.34 RMSE
Test Score: 65.96 RMSE

In [180]:
# shift train predictions for plotting
trainPredictPlot = np.empty_like(dataframe.values)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[lag:len(trainPredict)+lag, :] = trainPredict

# shift train original targets for plotting
trainY_asoriginal = np.empty_like(dataframe.values)
trainY_asoriginal[:, :] = np.nan
trainY_asoriginal[lag:len(trainY_noscale)+lag, :] = trainY_noscale

plot(trainY_asoriginal, trainPredictPlot, lag, "Resultados conjunto de entrenamiento LSTM memoria entre batches (1)", "train")

# shift test predictions for plotting
testPredictPlot = np.empty_like(dataframe.values)
testPredictPlot[:, :] = np.nan
testPredictPlot[(len(trainPredict)+2*lag):, :] = testPredict

# shift test original targets for plotting
testY_asoriginal = np.empty_like(dataframe.values)
testY_asoriginal[:, :] = np.nan
testY_asoriginal[(len(trainY_noscale)+2*lag):, :] = testY_noscale

plot(testY_asoriginal, testPredictPlot, lag, "Resultados conjunto de pruebas LSTM memoria entre batches (1)", "test")

plot_vs_series(dataframe.values, trainPredictPlot, testPredictPlot, "Resultados sobre toda la serie original LSTM memoria entre batches (1)")


Se aprecian mejores resultados. El RMSE se encuentra en el mismo orden que las LTSM básicas.

n) LSTM con memoria entre batches (batch size = 3)

Ahora la memoria se agrega con tamaños de batch igual a 3


In [181]:
lag = 3
batch_size = 3
model = Sequential()
model.add(LSTM(8, batch_input_shape=(batch_size, lag, 1), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(100):
    model.fit(TrainX, TrainY, nb_epoch=1, batch_size=batch_size, verbose=2, shuffle=False)
    model.reset_states()
model.save("LSTM_lag3_batch3.h5")


Epoch 1/1
0s - loss: 0.3515
Epoch 1/1
0s - loss: 0.1512
Epoch 1/1
0s - loss: 0.0880
Epoch 1/1
0s - loss: 0.0757
Epoch 1/1
0s - loss: 0.0715
Epoch 1/1
0s - loss: 0.0652
Epoch 1/1
0s - loss: 0.0591
Epoch 1/1
0s - loss: 0.0538
Epoch 1/1
0s - loss: 0.0490
Epoch 1/1
0s - loss: 0.0447
Epoch 1/1
0s - loss: 0.0407
Epoch 1/1
0s - loss: 0.0371
Epoch 1/1
0s - loss: 0.0339
Epoch 1/1
0s - loss: 0.0308
Epoch 1/1
0s - loss: 0.0281
Epoch 1/1
0s - loss: 0.0255
Epoch 1/1
0s - loss: 0.0232
Epoch 1/1
0s - loss: 0.0212
Epoch 1/1
0s - loss: 0.0194
Epoch 1/1
0s - loss: 0.0178
Epoch 1/1
0s - loss: 0.0164
Epoch 1/1
0s - loss: 0.0153
Epoch 1/1
0s - loss: 0.0144
Epoch 1/1
0s - loss: 0.0136
Epoch 1/1
0s - loss: 0.0130
Epoch 1/1
0s - loss: 0.0126
Epoch 1/1
0s - loss: 0.0122
Epoch 1/1
0s - loss: 0.0119
Epoch 1/1
0s - loss: 0.0117
Epoch 1/1
0s - loss: 0.0115
Epoch 1/1
0s - loss: 0.0113
Epoch 1/1
0s - loss: 0.0112
Epoch 1/1
0s - loss: 0.0110
Epoch 1/1
0s - loss: 0.0109
Epoch 1/1
0s - loss: 0.0107
Epoch 1/1
0s - loss: 0.0106
Epoch 1/1
0s - loss: 0.0104
Epoch 1/1
0s - loss: 0.0103
Epoch 1/1
0s - loss: 0.0102
Epoch 1/1
0s - loss: 0.0101
Epoch 1/1
0s - loss: 0.0099
Epoch 1/1
0s - loss: 0.0098
Epoch 1/1
0s - loss: 0.0097
Epoch 1/1
0s - loss: 0.0096
Epoch 1/1
0s - loss: 0.0095
Epoch 1/1
0s - loss: 0.0094
Epoch 1/1
0s - loss: 0.0093
Epoch 1/1
0s - loss: 0.0092
Epoch 1/1
0s - loss: 0.0091
Epoch 1/1
0s - loss: 0.0090
Epoch 1/1
0s - loss: 0.0089
Epoch 1/1
0s - loss: 0.0088
Epoch 1/1
0s - loss: 0.0087
Epoch 1/1
0s - loss: 0.0086
Epoch 1/1
0s - loss: 0.0085
Epoch 1/1
0s - loss: 0.0084
Epoch 1/1
0s - loss: 0.0083
Epoch 1/1
0s - loss: 0.0082
Epoch 1/1
0s - loss: 0.0082
Epoch 1/1
0s - loss: 0.0081
Epoch 1/1
0s - loss: 0.0080
Epoch 1/1
0s - loss: 0.0080
Epoch 1/1
0s - loss: 0.0079
Epoch 1/1
0s - loss: 0.0078
Epoch 1/1
0s - loss: 0.0077
Epoch 1/1
0s - loss: 0.0077
Epoch 1/1
0s - loss: 0.0076
Epoch 1/1
0s - loss: 0.0075
Epoch 1/1
0s - loss: 0.0075
Epoch 1/1
0s - loss: 0.0074
Epoch 1/1
0s - loss: 0.0073
Epoch 1/1
0s - loss: 0.0073
Epoch 1/1
0s - loss: 0.0072
Epoch 1/1
0s - loss: 0.0071
Epoch 1/1
0s - loss: 0.0071
Epoch 1/1
0s - loss: 0.0070
Epoch 1/1
0s - loss: 0.0070
Epoch 1/1
0s - loss: 0.0069
Epoch 1/1
0s - loss: 0.0068
Epoch 1/1
0s - loss: 0.0068
Epoch 1/1
0s - loss: 0.0067
Epoch 1/1
0s - loss: 0.0067
Epoch 1/1
0s - loss: 0.0066
Epoch 1/1
0s - loss: 0.0066
Epoch 1/1
0s - loss: 0.0065
Epoch 1/1
0s - loss: 0.0065
Epoch 1/1
0s - loss: 0.0064
Epoch 1/1
0s - loss: 0.0064
Epoch 1/1
0s - loss: 0.0063
Epoch 1/1
0s - loss: 0.0063
Epoch 1/1
0s - loss: 0.0062
Epoch 1/1
0s - loss: 0.0062
Epoch 1/1
0s - loss: 0.0061
Epoch 1/1
0s - loss: 0.0061
Epoch 1/1
0s - loss: 0.0060
Epoch 1/1
0s - loss: 0.0060
Epoch 1/1
0s - loss: 0.0060
Epoch 1/1
0s - loss: 0.0060
Epoch 1/1
0s - loss: 0.0060
Epoch 1/1
0s - loss: 0.0060

In [182]:
model = load_model("LSTM_lag3_batch3.h5")
trainPredict = model.predict(TrainX, batch_size=batch_size)
trainPredict = scaler.inverse_transform(trainPredict)
trainY_noscale = scaler.inverse_transform(TrainY)

testPredict = model.predict(TestX, batch_size=batch_size)
testPredict = scaler.inverse_transform(testPredict)
testY_noscale = scaler.inverse_transform(TestY)

In [183]:
# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY_noscale[:,0], trainPredict[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY_noscale[:,0], testPredict[:,0]))
print('Test Score: %.2f RMSE' % (testScore))


Train Score: 23.62 RMSE
Test Score: 106.14 RMSE

In [184]:
# shift train predictions for plotting
trainPredictPlot = np.empty_like(dataframe.values)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[lag:len(trainPredict)+lag, :] = trainPredict

# shift train original targets for plotting
trainY_asoriginal = np.empty_like(dataframe.values)
trainY_asoriginal[:, :] = np.nan
trainY_asoriginal[lag:len(trainY_noscale)+lag, :] = trainY_noscale

plot(trainY_asoriginal, trainPredictPlot, lag, "Resultados conjunto de entrenamiento LSTM memoria entre batches (3)", "train")

# shift test predictions for plotting
testPredictPlot = np.empty_like(dataframe.values)
testPredictPlot[:, :] = np.nan
testPredictPlot[(len(trainPredict)+2*lag):, :] = testPredict

# shift test original targets for plotting
testY_asoriginal = np.empty_like(dataframe.values)
testY_asoriginal[:, :] = np.nan
testY_asoriginal[(len(trainY_noscale)+2*lag):, :] = testY_noscale

plot(testY_asoriginal, testPredictPlot, lag, "Resultados conjunto de pruebas LSTM memoria entre batches (3)", "test")

plot_vs_series(dataframe.values, trainPredictPlot, testPredictPlot, "Resultados sobre toda la serie original LSTM memoria entre batches (3)")


El error RMSE aumenta bajo esta nueva red. La memoria cada batches de tamaño 3 induce una suavización de la función predictiva.

o) LSTM apilada

La versión deep de las LSTM


In [185]:
lag = 3
batch_size = 1
model = Sequential()
model.add(LSTM(8, batch_input_shape=(batch_size, lag, 1), stateful=True, return_sequences=True))
model.add(LSTM(8, batch_input_shape=(batch_size, lag, 1), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(100):
    model.fit(TrainX, TrainY, nb_epoch=1, batch_size=batch_size, verbose=2, shuffle=False)
    model.reset_states()
model.save("LSTM_lag3_stacked_batch_size1.h5")


Epoch 1/1
0s - loss: 0.0178
Epoch 1/1
0s - loss: 0.0400
Epoch 1/1
0s - loss: 0.0219
Epoch 1/1
0s - loss: 0.0166
Epoch 1/1
0s - loss: 0.0190
Epoch 1/1
0s - loss: 0.0195
Epoch 1/1
0s - loss: 0.0196
Epoch 1/1
0s - loss: 0.0188
Epoch 1/1
0s - loss: 0.0177
Epoch 1/1
0s - loss: 0.0155
Epoch 1/1
0s - loss: 0.0135
Epoch 1/1
0s - loss: 0.0116
Epoch 1/1
0s - loss: 0.0123
Epoch 1/1
0s - loss: 0.0143
Epoch 1/1
0s - loss: 0.0103
Epoch 1/1
0s - loss: 0.0133
Epoch 1/1
0s - loss: 0.0096
Epoch 1/1
0s - loss: 0.0108
Epoch 1/1
0s - loss: 0.0097
Epoch 1/1
0s - loss: 0.0089
Epoch 1/1
0s - loss: 0.0083
Epoch 1/1
0s - loss: 0.0082
Epoch 1/1
0s - loss: 0.0076
Epoch 1/1
0s - loss: 0.0069
Epoch 1/1
0s - loss: 0.0075
Epoch 1/1
0s - loss: 0.0071
Epoch 1/1
0s - loss: 0.0104
Epoch 1/1
0s - loss: 0.0079
Epoch 1/1
0s - loss: 0.0071
Epoch 1/1
0s - loss: 0.0121
Epoch 1/1
0s - loss: 0.0122
Epoch 1/1
0s - loss: 0.0220
Epoch 1/1
0s - loss: 0.0119
Epoch 1/1
0s - loss: 0.0089
Epoch 1/1
0s - loss: 0.0077
Epoch 1/1
0s - loss: 0.0084
Epoch 1/1
0s - loss: 0.0127
Epoch 1/1
0s - loss: 0.0082
Epoch 1/1
0s - loss: 0.0076
Epoch 1/1
0s - loss: 0.0069
Epoch 1/1
0s - loss: 0.0068
Epoch 1/1
0s - loss: 0.0069
Epoch 1/1
0s - loss: 0.0065
Epoch 1/1
0s - loss: 0.0078
Epoch 1/1
0s - loss: 0.0078
Epoch 1/1
0s - loss: 0.0102
Epoch 1/1
0s - loss: 0.0091
Epoch 1/1
0s - loss: 0.0064
Epoch 1/1
0s - loss: 0.0057
Epoch 1/1
0s - loss: 0.0056
Epoch 1/1
0s - loss: 0.0052
Epoch 1/1
0s - loss: 0.0058
Epoch 1/1
0s - loss: 0.0063
Epoch 1/1
0s - loss: 0.0060
Epoch 1/1
0s - loss: 0.0053
Epoch 1/1
0s - loss: 0.0049
Epoch 1/1
0s - loss: 0.0065
Epoch 1/1
0s - loss: 0.0056
Epoch 1/1
0s - loss: 0.0083
Epoch 1/1
0s - loss: 0.0069
Epoch 1/1
0s - loss: 0.0049
Epoch 1/1
0s - loss: 0.0053
Epoch 1/1
0s - loss: 0.0054
Epoch 1/1
0s - loss: 0.0052
Epoch 1/1
0s - loss: 0.0048
Epoch 1/1
0s - loss: 0.0044
Epoch 1/1
0s - loss: 0.0059
Epoch 1/1
0s - loss: 0.0084
Epoch 1/1
0s - loss: 0.0067
Epoch 1/1
0s - loss: 0.0059
Epoch 1/1
0s - loss: 0.0056
Epoch 1/1
0s - loss: 0.0053
Epoch 1/1
0s - loss: 0.0054
Epoch 1/1
0s - loss: 0.0045
Epoch 1/1
0s - loss: 0.0047
Epoch 1/1
0s - loss: 0.0056
Epoch 1/1
0s - loss: 0.0048
Epoch 1/1
0s - loss: 0.0043
Epoch 1/1
0s - loss: 0.0051
Epoch 1/1
0s - loss: 0.0058
Epoch 1/1
0s - loss: 0.0049
Epoch 1/1
0s - loss: 0.0040
Epoch 1/1
0s - loss: 0.0038
Epoch 1/1
0s - loss: 0.0036
Epoch 1/1
0s - loss: 0.0039
Epoch 1/1
0s - loss: 0.0040
Epoch 1/1
0s - loss: 0.0036
Epoch 1/1
0s - loss: 0.0037
Epoch 1/1
0s - loss: 0.0033
Epoch 1/1
0s - loss: 0.0036
Epoch 1/1
0s - loss: 0.0049
Epoch 1/1
0s - loss: 0.0036
Epoch 1/1
0s - loss: 0.0033
Epoch 1/1
0s - loss: 0.0042
Epoch 1/1
0s - loss: 0.0034
Epoch 1/1
0s - loss: 0.0036
Epoch 1/1
0s - loss: 0.0033
Epoch 1/1
0s - loss: 0.0035
Epoch 1/1
0s - loss: 0.0032
Epoch 1/1
0s - loss: 0.0033

In [186]:
model = load_model("LSTM_lag3_stacked_batch_size1.h5")
trainPredict = model.predict(TrainX, batch_size=batch_size)
trainPredict = scaler.inverse_transform(trainPredict)
trainY_noscale = scaler.inverse_transform(TrainY)

testPredict = model.predict(TestX, batch_size=batch_size)
testPredict = scaler.inverse_transform(testPredict)
testY_noscale = scaler.inverse_transform(TestY)

In [187]:
# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY_noscale[:,0], trainPredict[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY_noscale[:,0], testPredict[:,0]))
print('Test Score: %.2f RMSE' % (testScore))


Train Score: 18.45 RMSE
Test Score: 91.54 RMSE

In [188]:
# shift train predictions for plotting
trainPredictPlot = np.empty_like(dataframe.values)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[lag:len(trainPredict)+lag, :] = trainPredict

# shift train original targets for plotting
trainY_asoriginal = np.empty_like(dataframe.values)
trainY_asoriginal[:, :] = np.nan
trainY_asoriginal[lag:len(trainY_noscale)+lag, :] = trainY_noscale

plot(trainY_asoriginal, trainPredictPlot, lag, "Resultados conjunto de entrenamiento LSTM apilada con memoria entre batches (1)", "train")

# shift test predictions for plotting
testPredictPlot = np.empty_like(dataframe.values)
testPredictPlot[:, :] = np.nan
testPredictPlot[(len(trainPredict)+2*lag):, :] = testPredict

# shift test original targets for plotting
testY_asoriginal = np.empty_like(dataframe.values)
testY_asoriginal[:, :] = np.nan
testY_asoriginal[(len(trainY_noscale)+2*lag):, :] = testY_noscale

plot(testY_asoriginal, testPredictPlot, lag, "Resultados conjunto de pruebas LSTM apilada con memoria entre batches (1)", "test")

plot_vs_series(dataframe.values, trainPredictPlot, testPredictPlot, "Resultados sobre toda la serie original LSTM apilada con memoria entre batches (1)")


En este caso la red apilada tiene tendencia a overfitting, lo que se aprecia con la grán disminución en el error RMSE de entrenamiento.


In [ ]: