In this notebook, I will be varying the architecture of the neural network, similar as to what was done in the scikit-learn version of this analysis.

First, the data is loaded and preprocessed as before:


In [1]:
# Data preprocessing from Part 1
import datetime
import pandas as pd
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense
abalone_df = pd.read_csv('abalone.csv',names=['Sex','Length','Diameter','Height',
    'Whole Weight','Shucked Weight', 'Viscera Weight','Shell Weight', 'Rings'])
abalone_df['Male'] = (abalone_df['Sex']=='M').astype(int)
abalone_df['Female'] = (abalone_df['Sex']=='F').astype(int)
abalone_df['Infant'] = (abalone_df['Sex']=='I').astype(int)
abalone_df = abalone_df[abalone_df['Height']>0]
train, test = train_test_split(abalone_df, train_size=0.7)
x_train = train.drop(['Rings','Sex'], axis=1).values
y_train = pd.DataFrame(train['Rings']).values
x_test = test.drop(['Rings','Sex'], axis=1).values
y_test = pd.DataFrame(test['Rings']).values


Using TensorFlow backend.

To automatically construct a list of models to test, I used a listcomp in the following code. Each of these models requires exactly two hidden layers. The idea is that I will test a range of values from 5 to 30 for each hidden layer, incrementing by 5 each time. I also extended the list with three additional models that I thought would be interesting to test as well.


In [2]:
# Constructing a list of models to test
hlayers = [[x,y] for x in range(5,31,5) for y in range(5,31,5)]
hlayers.extend([[1,10],[10,1],[2,2]])

Next, I will iterate over each of the above models, trying three different activation functions: 'tanh', 'relu', and 'sigmoid'. A model with no activation function will also be tried. The same model-building procedure as in Part 1 of this blog series will be used.


In [3]:
# Iterate over the list of models, trying 3 different activation functions
begin = datetime.datetime.now()
results_dict = {}
for act in [None, 'tanh', 'relu', 'sigmoid']:
    for layers in hlayers:
        abalone_model = Sequential([
            Dense(layers[0], input_dim=10),
            Dense(layers[1], activation=act),
            Dense(1)])
        abalone_model.compile(optimizer='rmsprop',loss='mse',metrics=["mean_absolute_error"])
        results = abalone_model.fit(x_train, y_train, nb_epoch=50, verbose=0)
        score = abalone_model.evaluate(x_test, y_test)
        result_string = "[{},{}] {}".format(layers[0], layers[1], act)
        results_dict[result_string] = score[1]
# Save the results in a DataFrame
results_df = pd.DataFrame.from_dict(results_dict, orient="index")
results_df.rename(columns={0 : "MAE"}, inplace=True)
seconds = (datetime.datetime.now() - begin).total_seconds()
sec_string = "Total elapsed seconds: {}".format(seconds)
print(sec_string)


1253/1253 [==============================] - 0s     
1253/1253 [==============================] - 0s      
1253/1253 [==============================] - 0s      
1253/1253 [==============================] - 0s      
1253/1253 [==============================] - 0s      
 704/1253 [===============>..............] - ETA: 1s Total elapsed seconds: 1651.500494

Notice that the total elapsed time was about 30 minutes. I'm running this code on a fairly old Linux computer. I have no doubt that it would run faster with better hardware. I would like to try out GPU acceleration, but I don't currently have a computer with an CUDA-enabled GPU, so that will have to wait.

Finally, the following code will present the results in tabular form:


In [4]:
# Print the results matrix
results_df.sort_values('MAE')


Out[4]:
MAE
[25,15] tanh 1.463747
[20,30] relu 1.471423
[20,20] relu 1.476413
[25,10] tanh 1.482112
[20,10] tanh 1.482269
[25,20] sigmoid 1.482779
[30,25] tanh 1.483761
[20,30] tanh 1.488986
[25,15] sigmoid 1.491202
[10,20] tanh 1.491449
[15,15] tanh 1.492220
[30,10] sigmoid 1.493012
[10,30] tanh 1.493243
[15,20] relu 1.493355
[30,25] sigmoid 1.495390
[5,20] tanh 1.495964
[20,10] sigmoid 1.496207
[30,15] tanh 1.496708
[30,30] sigmoid 1.499643
[25,30] relu 1.500617
[25,25] tanh 1.502365
[5,25] tanh 1.502596
[30,30] relu 1.502638
[20,5] relu 1.503448
[30,5] sigmoid 1.504090
[20,30] sigmoid 1.504335
[20,15] relu 1.506200
[30,10] tanh 1.506404
[10,15] tanh 1.506666
[15,15] sigmoid 1.508139
... ...
[30,30] None 1.688278
[1,10] relu 1.689282
[15,25] None 1.701531
[30,20] tanh 1.704090
[25,15] relu 1.704278
[5,10] relu 1.705194
[2,2] None 1.713283
[5,5] tanh 1.713575
[1,10] tanh 1.727484
[10,30] sigmoid 1.728322
[25,25] sigmoid 1.728527
[10,10] relu 1.730568
[5,20] sigmoid 1.731878
[15,25] relu 1.734847
[1,10] None 1.737496
[5,10] None 1.741946
[10,10] None 1.743810
[10,5] sigmoid 1.757676
[10,30] None 1.760914
[5,15] sigmoid 1.761239
[5,25] sigmoid 1.764270
[15,30] None 1.776063
[2,2] relu 1.776358
[30,25] relu 1.814700
[5,5] sigmoid 1.827547
[2,2] tanh 1.830345
[2,2] sigmoid 1.835663
[1,10] sigmoid 1.846257
[10,1] tanh 2.315263
[10,1] sigmoid 5.327368

156 rows × 1 columns

The best results came from a neural network with 25 nodes in the first layer, 15 nodes in the second layer, and a "tanh" activation function. The top ten to twenty networks or so had fairly similar results- all were likely within the testing accuracy of the network. Given these results, I would likely choose the "[20,10] tanh" network, given its relative simplicity.

Comparing these results to the previous blog post using the scikit-learn neural network module, I am surprised that the "relu" activation function performed so well. Notice that the worst networks are typically those with no activation function, or too few nodes in each hidden layer.