T81-558: Applications of Deep Neural Networks

Class 10: Recurrent and LSTM Networks

Common Functions

Some of the common functions from previous classes that we will use again.


In [3]:
from sklearn import preprocessing
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Encode text values to dummy variables(i.e. [1,0,0],[0,1,0],[0,0,1] for red,green,blue)
def encode_text_dummy(df,name):
    dummies = pd.get_dummies(df[name])
    for x in dummies.columns:
        dummy_name = "{}-{}".format(name,x)
        df[dummy_name] = dummies[x]
    df.drop(name, axis=1, inplace=True)

# Encode text values to indexes(i.e. [1],[2],[3] for red,green,blue).
def encode_text_index(df,name):
    le = preprocessing.LabelEncoder()
    df[name] = le.fit_transform(df[name])
    return le.classes_

# Encode a numeric column as zscores
def encode_numeric_zscore(df,name,mean=None,sd=None):
    if mean is None:
        mean = df[name].mean()

    if sd is None:
        sd = df[name].std()

    df[name] = (df[name]-mean)/sd

# Convert all missing values in the specified column to the median
def missing_median(df, name):
    med = df[name].median()
    df[name] = df[name].fillna(med)

# Convert a Pandas dataframe to the x,y inputs that TensorFlow needs
def to_xy(df,target):
    result = []
    for x in df.columns:
        if x != target:
            result.append(x)

    # find out the type of the target column.  Is it really this hard? :(
    target_type = df[target].dtypes
    target_type = target_type[0] if hasattr(target_type, '__iter__') else target_type
    print(target_type)
    
    # Encode to int for classification, float otherwise. TensorFlow likes 32 bits.
    if target_type in (np.int64, np.int32):
        # Classification
        return df.as_matrix(result).astype(np.float32),df.as_matrix([target]).astype(np.int32)
    else:
        # Regression
        return df.as_matrix(result).astype(np.float32),df.as_matrix([target]).astype(np.float32)

# Nicely formatted time string
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)

# Regression chart, we will see more of this chart in the next class.
def chart_regression(pred,y):
    t = pd.DataFrame({'pred' : pred.flatten(), 'y' : y_test.flatten()})
    t.sort_values(by=['y'],inplace=True)
    a = plt.plot(t['y'].tolist(),label='expected')
    b = plt.plot(t['pred'].tolist(),label='prediction')
    plt.ylabel('output')
    plt.legend()
    plt.show()

Data Structure for Recurrent Neural Networks

Previously we trained neural networks with input ($x$) and expected output ($y$). $X$ was a matrix, the rows were training examples and the columns were values to be predicted. The definition of $x$ will be expanded and y will stay the same.

Dimensions of training set ($x$):

  • Axis 1: Training set elements (sequences) (must be of the same size as $y$ size)
  • Axis 2: Members of sequence
  • Axis 3: Features in data (like input neurons)

Previously, we might take as input a single stock price, to predict if we should buy (1), sell (-1), or hold (0).


In [ ]:
# 

x = [
    [32],
    [41],
    [39],
    [20],
    [15]
]

y = [
    1,
    -1,
    0,
    -1,
    1
]

print(x)
print(y)

This is essentially building a CSV file from scratch, to see it as a data frame, use the following:


In [ ]:
from IPython.display import display, HTML
import pandas as pd
import numpy as np

x = np.array(x)
print(x[:,0])


df = pd.DataFrame({'x':x[:,0], 'y':y})
display(df)

You might want to put volume in with the stock price.


In [ ]:
x = [
    [32,1383],
    [41,2928],
    [39,8823],
    [20,1252],
    [15,1532]
]

y = [
    1,
    -1,
    0,
    -1,
    1
]

print(x)
print(y)

In [ ]:
Again, very similar to what we did before.  The following shows this as a data frame.

In [ ]:
from IPython.display import display, HTML
import pandas as pd
import numpy as np

x = np.array(x)
print(x[:,0])


df = pd.DataFrame({'price':x[:,0], 'volume':x[:,1], 'y':y})
display(df)

Now we get to sequence format. We want to predict something over a sequence, so the data format needs to add a dimension. A maximum sequence length must be specified, but the individual sequences can be of any length.


In [ ]:
x = [
    [[32,1383],[41,2928],[39,8823],[20,1252],[15,1532]],
    [[35,8272],[32,1383],[41,2928],[39,8823],[20,1252]],
    [[37,2738],[35,8272],[32,1383],[41,2928],[39,8823]],
    [[34,2845],[37,2738],[35,8272],[32,1383],[41,2928]],
    [[32,2345],[34,2845],[37,2738],[35,8272],[32,1383]],
]

y = [
    1,
    -1,
    0,
    -1,
    1
]

print(x)
print(y)

Even if there is only one feature (price), the 3rd dimension must be used:


In [ ]:
x = [
    [[32],[41],[39],[20],[15]],
    [[35],[32],[41],[39],[20]],
    [[37],[35],[32],[41],[39]],
    [[34],[37],[35],[32],[41]],
    [[32],[34],[37],[35],[32]],
]

y = [
    1,
    -1,
    0,
    -1,
    1
]

print(x)
print(y)

Recurrent Neural Networks

So far the neural networks that we’ve examined have always had forward connections. The input layer always connects to the first hidden layer. Each hidden layer always connects to the next hidden layer. The final hidden layer always connects to the output layer. This manner to connect layers is the reason that these networks are called “feedforward.” Recurrent neural networks are not so rigid, as backward connections are also allowed. A recurrent connection links a neuron in a layer to either a previous layer or the neuron itself. Most recurrent neural network architectures maintain state in the recurrent connections. Feedforward neural networks don’t maintain any state. A recurrent neural network’s state acts as a sort of short-term memory for the neural network. Consequently, a recurrent neural network will not always produce the same output for a given input.

Recurrent neural networks do not force the connections to flow only from one layer to the next, from input layer to output layer. A recurrent connection occurs when a connection is formed between a neuron and one of the following other types of neurons:

  • The neuron itself
  • A neuron on the same level
  • A neuron on a previous level

Recurrent connections can never target the input neurons or the bias neurons.
The processing of recurrent connections can be challenging. Because the recurrent links create endless loops, the neural network must have some way to know when to stop. A neural network that entered an endless loop would not be useful. To prevent endless loops, we can calculate the recurrent connections with the following three approaches:

  • Context neurons
  • Calculating output over a fixed number of iterations
  • Calculating output until neuron output stabilizes

We refer to neural networks that use context neurons as a simple recurrent network (SRN). The context neuron is a special neuron type that remembers its input and provides that input as its output the next time that we calculate the network. For example, if we gave a context neuron 0.5 as input, it would output 0. Context neurons always output 0 on their first call. However, if we gave the context neuron a 0.6 as input, the output would be 0.5. We never weight the input connections to a context neuron, but we can weight the output from a context neuron just like any other connection in a network.

Context neurons allow us to calculate a neural network in a single feedforward pass. Context neurons usually occur in layers. A layer of context neurons will always have the same number of context neurons as neurons in its source layer, as demonstrated here:

As you can see from the above layer, two hidden neurons that are labeled hidden 1 and hidden 2 directly connect to the two context neurons. The dashed lines on these connections indicate that these are not weighted connections. These weightless connections are never dense. If these connections were dense, hidden 1 would be connected to both hidden 1 and hidden 2. However, the direct connection simply joins each hidden neuron to its corresponding context neuron. The two context neurons form dense, weighted connections to the two hidden neurons. Finally, the two hidden neurons also form dense connections to the neurons in the next layer. The two context neurons would form two connections to a single neuron in the next layer, four connections to two neurons, six connections to three neurons, and so on.

You can combine context neurons with the input, hidden, and output layers of a neural network in many different ways. In the next two sections, we explore two common SRN architectures.

In 1990, Elman introduced a neural network that provides pattern recognition to time series. This neural network type has one input neuron for each stream that you are using to predict. There is one output neuron for each time slice you are trying to predict. A single-hidden layer is positioned between the input and output layer. A layer of context neurons takes its input from the hidden layer output and feeds back into the same hidden layer. Consequently, the context layers always have the same number of neurons as the hidden layer, as demonstrated here:

The Elman neural network is a good general-purpose architecture for simple recurrent neural networks. You can pair any reasonable number of input neurons to any number of output neurons. Using normal weighted connections, the two context neurons are fully connected with the two hidden neurons. The two context neurons receive their state from the two non-weighted connections (dashed lines) from each of the two hidden neurons.

Backpropagation through time works by unfolding the SRN to become a regular neural network. To unfold the SRN, we construct a chain of neural networks equal to how far back in time we wish to go. We start with a neural network that contains the inputs for the current time, known as t. Next we replace the context with the entire neural network, up to the context neuron’s input. We continue for the desired number of time slices and replace the final context neuron with a 0. The following diagram shows an unfolded Elman neural network for two time slices.

As you can see, there are inputs for both t (current time) and t-1 (one time slice in the past). The bottom neural network stops at the hidden neurons because you don’t need everything beyond the hidden neurons to calculate the context input. The bottom network structure becomes the context to the top network structure. Of course, the bottom structure would have had a context as well that connects to its hidden neurons. However, because the output neuron above does not contribute to the context, only the top network (current time) has one.

Understanding LSTM

Some useful resources on LSTM/recurrent neural networks.

Long Short Term Neural Network (LSTM) are a type of recurrent unit that is often used with deep neural networks. For TensorFlow, LSTM can be thought of as a layer type that can be combined with other layer types, such as dense. LSTM makes use two transfer function types internally.

The first type of transfer function is the sigmoid. This transfer function type is used form gates inside of the unit. The sigmoid transfer function is given by the following equation:

$$ \text{S}(t) = \frac{1}{1 + e^{-t}} $$

The second type of transfer function is the hyperbolic tangent (tanh) function. This function is used to scale the output of the LSTM, similarly to how other transfer functions have been used in this course.

The graphs for these functions are shown here:


In [ ]:
%matplotlib inline

import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import math

def sigmoid(x):
    a = []
    for item in x:
        a.append(1/(1+math.exp(-item)))
    return a

def f2(x):
    a = []
    for item in x:
        a.append(math.tanh(item))
    return a

x = np.arange(-10., 10., 0.2)
y1 = sigmoid(x)
y2 = f2(x)

print("Sigmoid")
plt.plot(x,y1)
plt.show()

print("Hyperbolic Tangent(tanh)")
plt.plot(x,y2)
plt.show()

Both of these two functions compress their output to a specific range. For the sigmoid function, this range is 0 to 1. For the hyperbolic tangent function, this range is -1 to 1.

LSTM maintains an internal state and produces an output. The following diagram shows an LSTM unit over three time slices: the current time slice (t), as well as the previous (t-1) and next (t+1) slice:

The values $\hat{y}$ are the output from the unit, the values ($x$) are the input to the unit and the values $c$ are the context values. Both the output and context values are always fed to the next time slice. The context values allow

LSTM is made up of three gates:

  • Forget Gate (f_t) - Controls if/when the context is forgotten. (MC)
  • Input Gate (i_t) - Controls if/when a value should be remembered by the context. (M+/MS)
  • Output Gate (o_t) - Controls if/when the remembered value is allowed to pass from the unit. (RM)

Mathematically, the above diagram can be thought of as the following:

These are vector values.

First, calculate the forget gate value. This gate determines if the short term memory is forgotten. The value $b$ is a bias, just like the bias neurons we saw before. Except LSTM has a bias for every gate: $b_t$, $b_i$, and $b_o$.

$$ f_t = S(W_f \cdot [\hat{y}_{t-1}, x_t] + b_f) $$

Next, calculate the input gate value. This gate's value determines what will be remembered.

$$ i_t = S(W_i \cdot [\hat{y}_{t-1},x_t] + b_i) $$

Calculate a candidate context value (a value that might be remembered). This value is called $\tilde{c}$.

$$ \tilde{C}_t = \tanh(W_C \cdot [\hat{y}_{t-1},x_t]+b_C) $$

Determine the new context ($C_t$). Do this by remembering the candidate context ($i_t$), depending on input gate. Forget depending on the forget gate ($f_t$).

$$ C_t = f_t \cdot C_{t-1}+i_t \cdot \tilde{C}_t $$

Calculate the output gate ($o_t$):

$$ o_t = S(W_o \cdot [\hat{y}_{t-1},x_t] + b_o ) $$

Calculate the actual output ($\hat{y}_t$):

$$ \hat{y}_t = o_t \cdot \tanh(C_t) $$

In [ ]:

Simple TensorFlow LSTM Example

The following code creates the LSTM network.


In [47]:
import numpy as np
import pandas
import tensorflow as tf
from sklearn import metrics
from tensorflow.models.rnn import rnn, rnn_cell
from tensorflow.contrib import skflow

SEQUENCE_SIZE = 6
HIDDEN_SIZE = 20
NUM_CLASSES = 4

def char_rnn_model(X, y):
    byte_list = skflow.ops.split_squeeze(1, SEQUENCE_SIZE, X)
    cell = rnn_cell.LSTMCell(HIDDEN_SIZE)
    _, encoding = rnn.rnn(cell, byte_list, dtype=tf.float32)
    return skflow.models.logistic_regression(encoding, y)

classifier = skflow.TensorFlowEstimator(model_fn=char_rnn_model, n_classes=NUM_CLASSES,
    steps=100, optimizer='Adam', learning_rate=0.01, continue_training=True)

The following code trains on a data set (x) with a max sequence size of 6 (columns) and 6 training elements (rows)


In [48]:
x = [
    [[0],[1],[1],[0],[0],[0]],
    [[0],[0],[0],[2],[2],[0]],
    [[0],[0],[0],[0],[3],[3]],
    [[0],[2],[2],[0],[0],[0]],
    [[0],[0],[3],[3],[0],[0]],
    [[0],[0],[0],[0],[1],[1]]
]
x = np.array(x,dtype=np.float32)
y = np.array([1,2,3,2,3,1])

classifier.fit(x, y)


Step #100, epoch #100, avg. train loss: 0.30626
Out[48]:
TensorFlowEstimator(batch_size=32, class_weight=None, clip_gradients=5.0,
          config=None, continue_training=True, learning_rate=0.01,
          model_fn=<function char_rnn_model at 0x7efec8f62510>,
          n_classes=4, optimizer='Adam', steps=100, verbose=1)

In [49]:
test = [[[0],[0],[0],[0],[3],[3]]]
test = np.array(test)

classifier.predict(test)


Out[49]:
array([3])

Stock Market Example


In [14]:
# How to read data from the stock market.
from IPython.display import display, HTML
import pandas.io.data as web
import datetime

start = datetime.datetime(2014, 1, 1)
end = datetime.datetime(2014, 12, 31)

f=web.DataReader('tsla', 'yahoo', start, end)
display(f)


Open High Low Close Volume Adj Close
Date
2014-01-02 149.800003 152.479996 146.550003 150.100006 6188400 150.100006
2014-01-03 150.000000 152.190002 148.600006 149.559998 4695000 149.559998
2014-01-06 150.000000 150.399994 145.240005 147.000000 5361100 147.000000
2014-01-07 147.619995 150.399994 145.250000 149.360001 5034100 149.360001
2014-01-08 148.850006 153.699997 148.759995 151.279999 6163200 151.279999
2014-01-09 152.500000 153.429993 146.850006 147.529999 5382000 147.529999
2014-01-10 148.460007 148.899994 142.250000 145.720001 7446100 145.720001
2014-01-13 145.779999 147.000000 137.820007 139.339996 6316100 139.339996
2014-01-14 140.500000 162.000000 136.669998 161.270004 27607000 161.270004
2014-01-15 168.449997 172.229996 162.100006 164.130005 20465600 164.130005
2014-01-16 162.500000 172.699997 162.399994 170.970001 11959400 170.970001
2014-01-17 170.190002 173.199997 167.949997 170.009995 9206200 170.009995
2014-01-21 171.240005 177.289993 170.809998 176.679993 9734700 176.679993
2014-01-22 177.809998 180.320007 174.759995 178.559998 7022600 178.559998
2014-01-23 177.229996 182.380005 173.419998 181.500000 7867400 181.500000
2014-01-24 177.850006 180.479996 173.529999 174.600006 7664300 174.600006
2014-01-27 175.160004 177.919998 164.710007 169.619995 8716400 169.619995
2014-01-28 171.500000 178.979996 171.000000 178.380005 6093400 178.380005
2014-01-29 175.300003 179.089996 173.130005 175.229996 5935500 175.229996
2014-01-30 178.000000 184.779999 177.009995 182.839996 8565000 182.839996
2014-01-31 178.850006 186.000000 178.509995 181.410004 6508800 181.410004
2014-02-03 182.889999 184.880005 175.160004 177.110001 6764900 177.110001
2014-02-04 180.699997 181.600006 176.199997 178.729996 4686300 178.729996
2014-02-05 178.300003 180.589996 169.360001 174.419998 7268000 174.419998
2014-02-06 176.300003 180.110001 176.000000 178.380005 5841600 178.380005
2014-02-07 181.009995 186.630005 179.600006 186.529999 8928500 186.529999
2014-02-10 189.339996 199.300003 189.320007 196.559998 12970700 196.559998
2014-02-11 198.970001 202.199997 192.699997 196.619995 10709900 196.619995
2014-02-12 195.779999 198.270004 194.320007 195.320007 5173700 195.320007
2014-02-13 193.339996 202.720001 193.250000 199.630005 8029300 199.630005
... ... ... ... ... ... ...
2014-11-18 255.860001 259.989990 255.509995 257.700012 4473000 257.700012
2014-11-19 250.610001 251.880005 245.600006 247.740005 7918500 247.740005
2014-11-20 247.949997 250.929993 246.000000 248.710007 3587200 248.710007
2014-11-21 252.210007 252.779999 242.169998 242.779999 7485100 242.779999
2014-11-24 245.199997 247.600006 240.639999 246.720001 4789700 246.720001
2014-11-25 247.350006 249.720001 246.089996 248.089996 3159800 248.089996
2014-11-26 248.339996 249.000000 246.600006 248.440002 1981200 248.440002
2014-11-28 245.350006 246.690002 242.520004 244.520004 2119700 244.520004
2014-12-01 241.160004 242.470001 229.009995 231.639999 8619400 231.639999
2014-12-02 234.570007 234.880005 228.000000 231.429993 5887000 231.429993
2014-12-03 226.250000 229.720001 225.500000 229.300003 5307700 229.300003
2014-12-04 228.600006 230.899994 227.809998 228.279999 3855600 228.279999
2014-12-05 228.669998 229.389999 222.259995 223.710007 6063600 223.710007
2014-12-08 221.539993 224.860001 212.339996 214.360001 9225600 214.360001
2014-12-09 209.339996 217.729996 204.270004 216.889999 9431500 216.889999
2014-12-10 214.130005 216.770004 207.699997 209.839996 7314100 209.839996
2014-12-11 210.529999 215.429993 208.229996 208.880005 6694400 208.880005
2014-12-12 204.820007 211.679993 204.500000 207.000000 7173800 207.000000
2014-12-15 209.289993 209.800003 202.669998 204.039993 5218300 204.039993
2014-12-16 200.889999 203.679993 195.369995 197.809998 8426100 197.809998
2014-12-17 193.059998 206.649994 192.649994 205.820007 7367800 205.820007
2014-12-18 212.380005 218.440002 211.800003 218.259995 7483300 218.259995
2014-12-19 220.190002 220.399994 214.500000 219.289993 6910500 219.289993
2014-12-22 220.000000 224.059998 218.259995 222.600006 4799400 222.600006
2014-12-23 223.809998 224.320007 219.520004 220.970001 4505700 220.970001
2014-12-24 219.770004 222.500000 219.250000 222.259995 1332200 222.259995
2014-12-26 221.509995 228.500000 221.500000 227.820007 3327000 227.820007
2014-12-29 226.899994 227.910004 224.020004 225.710007 2802500 225.710007
2014-12-30 223.990005 225.649994 221.399994 222.229996 2903200 222.229996
2014-12-31 223.089996 225.679993 222.250000 222.410004 2297500 222.410004

252 rows × 6 columns


In [15]:
import numpy as np
prices = f.Close.pct_change().tolist() # to percent changes
prices = prices[1:] # skip the first, no percent change


SEQUENCE_SIZE = 5
x = []
y = []

for i in range(len(prices)-SEQUENCE_SIZE-1):
    #print(i)
    window = prices[i:(i+SEQUENCE_SIZE)]
    after_window = prices[i+SEQUENCE_SIZE]
    window = [[x] for x in window]
    #print("{} - {}".format(window,after_window))
    x.append(window)
    y.append(after_window)
    
x = np.array(x)
print(len(x))


245

In [16]:
from tensorflow.contrib import skflow
from tensorflow.models.rnn import rnn, rnn_cell
import tensorflow as tf

HIDDEN_SIZE = 20

def char_rnn_model(X, y):
    byte_list = skflow.ops.split_squeeze(1, SEQUENCE_SIZE, X)
    cell = rnn_cell.LSTMCell(HIDDEN_SIZE)
    _, encoding = rnn.rnn(cell, byte_list, dtype=tf.float32)
    return skflow.models.linear_regression(encoding, y)

regressor = skflow.TensorFlowEstimator(model_fn=char_rnn_model, n_classes=1,
    steps=100, optimizer='Adam', learning_rate=0.01, continue_training=True)

regressor.fit(x, y)


Step #100, epoch #12, avg. train loss: 0.04157
Out[16]:
TensorFlowEstimator(batch_size=32, class_weight=None, clip_gradients=5.0,
          config=None, continue_training=True, learning_rate=0.01,
          model_fn=<function char_rnn_model at 0x7f693bbba620>,
          n_classes=1, optimizer='Adam', steps=100, verbose=1)

In [17]:
# Try an in-sample prediction

from sklearn import metrics
# Measure RMSE error.  RMSE is common for regression.
pred = regressor.predict(x)
score = np.sqrt(metrics.mean_squared_error(pred,y))
print("Final score (RMSE): {}".format(score))


Final score (RMSE): 0.03126534778871552

In [19]:
# Try out of sample
start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2015, 12, 31)

f=web.DataReader('tsla', 'yahoo', start, end)

import numpy as np
prices = f.Close.pct_change().tolist() # to percent changes
prices = prices[1:] # skip the first, no percent change


SEQUENCE_SIZE = 5
x = []
y = []

for i in range(len(prices)-SEQUENCE_SIZE-1):
    window = prices[i:(i+SEQUENCE_SIZE)]
    after_window = prices[i+SEQUENCE_SIZE]
    window = [[x] for x in window]
    x.append(window)
    y.append(after_window)
    
x = np.array(x)

# Measure RMSE error.  RMSE is common for regression.
pred = regressor.predict(x)
score = np.sqrt(metrics.mean_squared_error(pred,y))
print("Out of sample score (RMSE): {}".format(score))


Out of sample score (RMSE): 0.02485462016585515

Assignment 3 Solution

Basic neural network solution:


In [40]:
import os
import pandas as pd
from sklearn.cross_validation import train_test_split
import tensorflow.contrib.learn as skflow
import numpy as np
from sklearn import metrics

path = "./data/"
    
filename = os.path.join(path,"t81_558_train.csv")    
train_df = pd.read_csv(filename)

train_df.drop('id',1,inplace=True)

train_x, train_y = to_xy(train_df,'outcome')

train_x, test_x, train_y, test_y = train_test_split(
    train_x, train_y, test_size=0.25, random_state=42)

# Create a deep neural network with 3 hidden layers of 50, 25, 10
regressor = skflow.TensorFlowDNNRegressor(hidden_units=[50, 25, 10], steps=5000)

# Early stopping
early_stop = skflow.monitors.ValidationMonitor(test_x, test_y,
    early_stopping_rounds=200, print_steps=50)

# Fit/train neural network
regressor.fit(train_x, train_y, monitor=early_stop)

# Measure RMSE error.  RMSE is common for regression.
pred = regressor.predict(test_x)
score = np.sqrt(metrics.mean_squared_error(pred,test_y))
print("Final score (RMSE): {}".format(score))

####################
# Build submit file
####################
from IPython.display import display, HTML
filename = os.path.join(path,"t81_558_test.csv")    
submit_df = pd.read_csv(filename)
ids = submit_df.Id
submit_df.drop('Id',1,inplace=True)
submit_x = submit_df.as_matrix()

pred_submit = regressor.predict(submit_x)

submit_df = pd.DataFrame({'Id': ids, 'outcome': pred_submit[:,0]})
submit_filename = os.path.join(path,"t81_558_jheaton_submit.csv")
submit_df.to_csv(submit_filename, index=False)

display(submit_df)


float64
Step #49, avg. train loss: 262.02747, avg. val loss: 169.31293
Step #99, avg. train loss: 175.77449, avg. val loss: 169.29913
Step #149, avg. train loss: 84.64280, avg. val loss: 169.13676
Step #199, avg. train loss: 118.05414, avg. val loss: 168.87610
Step #249, avg. train loss: 204.99768, avg. val loss: 168.47989
Step #299, avg. train loss: 255.39549, avg. val loss: 168.20102
Step #349, avg. train loss: 179.02652, avg. val loss: 168.05875
Step #399, avg. train loss: 272.38480, avg. val loss: 167.92145
Step #449, avg. train loss: 474.30655, avg. val loss: 167.85458
Step #499, avg. train loss: 158.72092, avg. val loss: 167.89682
Step #549, avg. train loss: 284.42297, avg. val loss: 167.23676
Step #599, avg. train loss: 256.35623, avg. val loss: 167.16699
Step #649, avg. train loss: 174.39070, avg. val loss: 166.59933
Step #699, avg. train loss: 133.87422, avg. val loss: 166.63768
Step #749, avg. train loss: 219.42043, avg. val loss: 166.40108
Step #799, avg. train loss: 111.38207, avg. val loss: 165.72337
Step #850, epoch #1, avg. train loss: 150.21307, avg. val loss: 166.12837
Step #900, epoch #1, avg. train loss: 222.44461, avg. val loss: 165.18475
Step #950, epoch #1, avg. train loss: 268.78259, avg. val loss: 165.05949
Step #1000, epoch #1, avg. train loss: 259.04990, avg. val loss: 164.80380
Step #1050, epoch #1, avg. train loss: 335.93539, avg. val loss: 164.45537
Step #1100, epoch #1, avg. train loss: 315.20062, avg. val loss: 164.21877
Step #1150, epoch #1, avg. train loss: 149.28389, avg. val loss: 163.79355
Step #1200, epoch #1, avg. train loss: 158.25729, avg. val loss: 163.66724
Step #1250, epoch #1, avg. train loss: 164.98840, avg. val loss: 163.85109
Step #1300, epoch #1, avg. train loss: 200.14111, avg. val loss: 164.00955
Step #1350, epoch #1, avg. train loss: 97.43488, avg. val loss: 163.70239
Step #1400, epoch #1, avg. train loss: 169.25858, avg. val loss: 162.92458
Step #1450, epoch #1, avg. train loss: 59.47280, avg. val loss: 163.10606
Step #1500, epoch #1, avg. train loss: 200.46297, avg. val loss: 162.67957
Step #1550, epoch #1, avg. train loss: 189.56543, avg. val loss: 163.05392
Step #1600, epoch #1, avg. train loss: 151.75871, avg. val loss: 163.28995
Step #1650, epoch #2, avg. train loss: 223.84618, avg. val loss: 161.94337
Step #1700, epoch #2, avg. train loss: 342.18832, avg. val loss: 161.32852
Step #1750, epoch #2, avg. train loss: 202.27881, avg. val loss: 162.57892
Step #1800, epoch #2, avg. train loss: 199.05495, avg. val loss: 162.29051
Step #1850, epoch #2, avg. train loss: 195.59039, avg. val loss: 161.27264
Step #1900, epoch #2, avg. train loss: 167.66586, avg. val loss: 160.89017
Step #1950, epoch #2, avg. train loss: 68.90379, avg. val loss: 161.50226
Step #2000, epoch #2, avg. train loss: 222.50452, avg. val loss: 161.14532
Step #2050, epoch #2, avg. train loss: 225.49635, avg. val loss: 160.25165
Step #2100, epoch #2, avg. train loss: 237.03014, avg. val loss: 160.22340
Step #2150, epoch #2, avg. train loss: 206.91718, avg. val loss: 160.10815
Step #2200, epoch #2, avg. train loss: 376.84235, avg. val loss: 160.25551
Step #2250, epoch #2, avg. train loss: 81.24576, avg. val loss: 159.93127
Step #2300, epoch #2, avg. train loss: 173.83713, avg. val loss: 159.00479
Step #2350, epoch #2, avg. train loss: 134.20891, avg. val loss: 159.34698
Step #2400, epoch #2, avg. train loss: 247.74081, avg. val loss: 159.52701
Step #2450, epoch #2, avg. train loss: 150.32472, avg. val loss: 160.43280
Step #2500, epoch #3, avg. train loss: 147.11612, avg. val loss: 159.66171
Step #2550, epoch #3, avg. train loss: 160.58397, avg. val loss: 159.49266
Stopping. Best step:
 step 2357 with loss 158.3131103515625
Final score (RMSE): 17.850584030151367
Id outcome
0 1 2.984916
1 2 -0.033047
2 3 -2.751747
3 4 -0.465436
4 5 3.323134
5 6 -0.111835
6 7 -2.564799
7 8 -0.236744
8 9 7.016618
9 10 -1.404967
10 11 0.219732
11 12 -5.178894
12 13 3.522084
13 14 -5.464541
14 15 -2.210263
15 16 -1.344966
16 17 0.107996
17 18 2.744113
18 19 1.354974
19 20 -0.095815
20 21 -0.207139
21 22 -0.199414
22 23 -0.277334
23 24 5.914406
24 25 2.992425
25 26 -0.059389
26 27 -0.144523
27 28 -0.654598
28 29 -0.340663
29 30 -1.725085
... ... ...
44805 44806 0.013280
44806 44807 -2.594729
44807 44808 -5.415764
44808 44809 1.559146
44809 44810 -0.185846
44810 44811 -2.472906
44811 44812 2.547927
44812 44813 0.932792
44813 44814 3.796257
44814 44815 0.775063
44815 44816 -1.466948
44816 44817 0.511186
44817 44818 0.835496
44818 44819 -1.716403
44819 44820 -0.155211
44820 44821 -0.187585
44821 44822 -4.715468
44822 44823 2.772964
44823 44824 -2.475748
44824 44825 0.386926
44825 44826 1.134414
44826 44827 2.409041
44827 44828 -0.125013
44828 44829 1.313928
44829 44830 -0.865511
44830 44831 -0.693406
44831 44832 -0.181998
44832 44833 -0.412026
44833 44834 -0.413845
44834 44835 -0.020959

44835 rows × 2 columns

The following code uses a random forest to rank the importance of features. This can be used both to rank the origional features and new ones created.


In [41]:
import matplotlib.pyplot as plt
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import RandomForestRegressor


# Build a forest and compute the feature importances
forest = RandomForestRegressor(n_estimators=50,
                              random_state=0, verbose = True)
print("Training random forest")
forest.fit(train_x, train_y)
importances = forest.feature_importances_
std = np.std([tree.feature_importances_ for tree in forest.estimators_],
             axis=0)
indices = np.argsort(importances)[::-1]

# Print the feature ranking
#train_df.drop('outcome',1,inplace=True)
bag_cols = train_df.columns.values
print("Feature ranking:")

for f in range(train_x.shape[1]):
    print("{}. {} ({})".format(f + 1, bag_cols[indices[f]], importances[indices[f]]))


Training random forest
/usr/local/lib/python3.4/dist-packages/ipykernel/__main__.py:10: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
[Parallel(n_jobs=1)]: Done  49 tasks       | elapsed:   19.7s
Feature ranking:
1. f (0.17347717510242006)
2. b (0.15743856858729224)
3. a (0.15083287490096894)
4. d (0.13655150208195754)
5. c (0.13522566532659755)
6. e (0.12661078605327306)
7. g (0.11986342794749044)
[Parallel(n_jobs=1)]: Done  50 out of  50 | elapsed:   20.1s finished

In [ ]:
The following code uses engineered features.

In [45]:
import os
import pandas as pd
from sklearn.cross_validation import train_test_split
import tensorflow.contrib.learn as skflow
import numpy as np
from sklearn import metrics

path = "./data/"
    
filename = os.path.join(path,"t81_558_train.csv")    
train_df = pd.read_csv(filename)

train_df.drop('id',1,inplace=True)
#train_df.drop('g',1,inplace=True)
#train_df.drop('e',1,inplace=True)


train_df.insert(0, "a-b", train_df.a - train_df.b)
#display(train_df)

train_x, train_y = to_xy(train_df,'outcome')

train_x, test_x, train_y, test_y = train_test_split(
    train_x, train_y, test_size=0.25, random_state=42)

# Create a deep neural network with 3 hidden layers of 50, 25, 10
regressor = skflow.TensorFlowDNNRegressor(hidden_units=[50, 25, 10], steps=5000)

# Early stopping
early_stop = skflow.monitors.ValidationMonitor(test_x, test_y,
    early_stopping_rounds=200, print_steps=50)

# Fit/train neural network
regressor.fit(train_x, train_y, monitor=early_stop)

# Measure RMSE error.  RMSE is common for regression.
pred = regressor.predict(test_x)
score = np.sqrt(metrics.mean_squared_error(pred,test_y))
print("Final score (RMSE): {}".format(score))

# foxtrot bravo
# charlie alpha


float64
Step #49, avg. train loss: 262.01456, avg. val loss: 169.29431
Step #99, avg. train loss: 175.81499, avg. val loss: 169.23647
Step #149, avg. train loss: 84.75820, avg. val loss: 169.22414
Step #199, avg. train loss: 118.46288, avg. val loss: 169.19536
Step #249, avg. train loss: 205.13033, avg. val loss: 169.12578
Step #299, avg. train loss: 256.44272, avg. val loss: 169.04340
Step #349, avg. train loss: 179.59492, avg. val loss: 168.99483
Step #399, avg. train loss: 272.88321, avg. val loss: 168.94601
Step #449, avg. train loss: 475.41263, avg. val loss: 168.86700
Step #499, avg. train loss: 159.91197, avg. val loss: 168.89235
Step #549, avg. train loss: 285.60718, avg. val loss: 168.71172
Step #599, avg. train loss: 257.62073, avg. val loss: 168.32053
Step #649, avg. train loss: 175.43346, avg. val loss: 168.01474
Step #699, avg. train loss: 134.36299, avg. val loss: 167.99881
Step #749, avg. train loss: 219.90060, avg. val loss: 167.95235
Step #799, avg. train loss: 112.76654, avg. val loss: 167.33336
Step #850, epoch #1, avg. train loss: 152.12091, avg. val loss: 167.34566
Step #900, epoch #1, avg. train loss: 225.16454, avg. val loss: 166.42712
Step #950, epoch #1, avg. train loss: 271.27850, avg. val loss: 165.88651
Step #1000, epoch #1, avg. train loss: 258.29684, avg. val loss: 165.58501
Step #1050, epoch #1, avg. train loss: 336.41226, avg. val loss: 165.26320
Step #1100, epoch #1, avg. train loss: 316.56357, avg. val loss: 165.37637
Step #1150, epoch #1, avg. train loss: 149.58206, avg. val loss: 164.95998
Step #1200, epoch #1, avg. train loss: 159.21538, avg. val loss: 164.73892
Step #1250, epoch #1, avg. train loss: 166.58478, avg. val loss: 164.61166
Step #1300, epoch #1, avg. train loss: 201.38486, avg. val loss: 164.41125
Step #1350, epoch #1, avg. train loss: 99.02577, avg. val loss: 164.61804
Step #1400, epoch #1, avg. train loss: 171.07423, avg. val loss: 163.84512
Step #1450, epoch #1, avg. train loss: 59.83423, avg. val loss: 163.45181
Step #1500, epoch #1, avg. train loss: 202.18701, avg. val loss: 163.10725
Step #1550, epoch #1, avg. train loss: 191.93936, avg. val loss: 163.24649
Step #1600, epoch #1, avg. train loss: 151.62419, avg. val loss: 163.31381
Step #1650, epoch #2, avg. train loss: 223.15309, avg. val loss: 162.15009
Step #1700, epoch #2, avg. train loss: 339.78391, avg. val loss: 162.09174
Step #1750, epoch #2, avg. train loss: 203.35071, avg. val loss: 162.79472
Step #1800, epoch #2, avg. train loss: 199.48436, avg. val loss: 162.70357
Step #1850, epoch #2, avg. train loss: 195.52251, avg. val loss: 161.53848
Step #1900, epoch #2, avg. train loss: 167.04567, avg. val loss: 161.51526
Step #1950, epoch #2, avg. train loss: 70.80038, avg. val loss: 162.08649
Step #2000, epoch #2, avg. train loss: 225.86168, avg. val loss: 161.81807
Step #2050, epoch #2, avg. train loss: 224.96057, avg. val loss: 160.70282
Step #2100, epoch #2, avg. train loss: 239.22934, avg. val loss: 160.82790
Step #2150, epoch #2, avg. train loss: 207.87396, avg. val loss: 161.22589
Step #2200, epoch #2, avg. train loss: 374.89554, avg. val loss: 161.43372
Step #2250, epoch #2, avg. train loss: 82.19041, avg. val loss: 160.68716
Step #2300, epoch #2, avg. train loss: 174.15918, avg. val loss: 159.94739
Step #2350, epoch #2, avg. train loss: 136.94189, avg. val loss: 159.85329
Step #2400, epoch #2, avg. train loss: 249.86937, avg. val loss: 160.53398
Step #2450, epoch #2, avg. train loss: 149.51268, avg. val loss: 160.83829
Step #2500, epoch #3, avg. train loss: 147.07068, avg. val loss: 160.68674
Step #2550, epoch #3, avg. train loss: 162.09476, avg. val loss: 160.47844
Stopping. Best step:
 step 2357 with loss 159.2803192138672
Final score (RMSE): 17.914104461669922

In [ ]: