In this notebook, I will be varying the architecture of the neural network, similar as to what was done in the scikit-learn version of this analysis.

First, the data is loaded and preprocessed as before:



In [1]:

    
# Data preprocessing from Part 1
import datetime
import pandas as pd
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense
abalone_df = pd.read_csv('abalone.csv',names=['Sex','Length','Diameter','Height',
    'Whole Weight','Shucked Weight', 'Viscera Weight','Shell Weight', 'Rings'])
abalone_df['Male'] = (abalone_df['Sex']=='M').astype(int)
abalone_df['Female'] = (abalone_df['Sex']=='F').astype(int)
abalone_df['Infant'] = (abalone_df['Sex']=='I').astype(int)
abalone_df = abalone_df[abalone_df['Height']>0]
train, test = train_test_split(abalone_df, train_size=0.7)
x_train = train.drop(['Rings','Sex'], axis=1).values
y_train = pd.DataFrame(train['Rings']).values
x_test = test.drop(['Rings','Sex'], axis=1).values
y_test = pd.DataFrame(test['Rings']).values









    



Using TensorFlow backend.

To automatically construct a list of models to test, I used a listcomp in the following code. Each of these models requires exactly two hidden layers. The idea is that I will test a range of values from 5 to 30 for each hidden layer, incrementing by 5 each time. I also extended the list with three additional models that I thought would be interesting to test as well.



In [2]:

    
# Constructing a list of models to test
hlayers = [[x,y] for x in range(5,31,5) for y in range(5,31,5)]
hlayers.extend([[1,10],[10,1],[2,2]])

Next, I will iterate over each of the above models, trying three different activation functions: 'tanh', 'relu', and 'sigmoid'. A model with no activation function will also be tried. The same model-building procedure as in Part 1 of this blog series will be used.



In [3]:

    
# Iterate over the list of models, trying 3 different activation functions
begin = datetime.datetime.now()
results_dict = {}
for act in [None, 'tanh', 'relu', 'sigmoid']:
    for layers in hlayers:
        abalone_model = Sequential([
            Dense(layers[0], input_dim=10),
            Dense(layers[1], activation=act),
            Dense(1)])
        abalone_model.compile(optimizer='rmsprop',loss='mse',metrics=["mean_absolute_error"])
        results = abalone_model.fit(x_train, y_train, nb_epoch=50, verbose=0)
        score = abalone_model.evaluate(x_test, y_test)
        result_string = "[{},{}] {}".format(layers[0], layers[1], act)
        results_dict[result_string] = score[1]
# Save the results in a DataFrame
results_df = pd.DataFrame.from_dict(results_dict, orient="index")
results_df.rename(columns={0 : "MAE"}, inplace=True)
seconds = (datetime.datetime.now() - begin).total_seconds()
sec_string = "Total elapsed seconds: {}".format(seconds)
print(sec_string)









    



1253/1253 [==============================] - 0s     
1253/1253 [==============================] - 0s      
1253/1253 [==============================] - 0s      
1253/1253 [==============================] - 0s      
1253/1253 [==============================] - 0s      
 704/1253 [===============>..............] - ETA: 1s Total elapsed seconds: 1651.500494

Notice that the total elapsed time was about 30 minutes. I'm running this code on a fairly old Linux computer. I have no doubt that it would run faster with better hardware. I would like to try out GPU acceleration, but I don't currently have a computer with an CUDA-enabled GPU, so that will have to wait.

Finally, the following code will present the results in tabular form:



In [4]:

    
# Print the results matrix
results_df.sort_values('MAE')









    Out[4]:






  
    
      
      MAE
    
  
  
    
      [25,15] tanh
      1.463747
    
    
      [20,30] relu
      1.471423
    
    
      [20,20] relu
      1.476413
    
    
      [25,10] tanh
      1.482112
    
    
      [20,10] tanh
      1.482269
    
    
      [25,20] sigmoid
      1.482779
    
    
      [30,25] tanh
      1.483761
    
    
      [20,30] tanh
      1.488986
    
    
      [25,15] sigmoid
      1.491202
    
    
      [10,20] tanh
      1.491449
    
    
      [15,15] tanh
      1.492220
    
    
      [30,10] sigmoid
      1.493012
    
    
      [10,30] tanh
      1.493243
    
    
      [15,20] relu
      1.493355
    
    
      [30,25] sigmoid
      1.495390
    
    
      [5,20] tanh
      1.495964
    
    
      [20,10] sigmoid
      1.496207
    
    
      [30,15] tanh
      1.496708
    
    
      [30,30] sigmoid
      1.499643
    
    
      [25,30] relu
      1.500617
    
    
      [25,25] tanh
      1.502365
    
    
      [5,25] tanh
      1.502596
    
    
      [30,30] relu
      1.502638
    
    
      [20,5] relu
      1.503448
    
    
      [30,5] sigmoid
      1.504090
    
    
      [20,30] sigmoid
      1.504335
    
    
      [20,15] relu
      1.506200
    
    
      [30,10] tanh
      1.506404
    
    
      [10,15] tanh
      1.506666
    
    
      [15,15] sigmoid
      1.508139
    
    
      ...
      ...
    
    
      [30,30] None
      1.688278
    
    
      [1,10] relu
      1.689282
    
    
      [15,25] None
      1.701531
    
    
      [30,20] tanh
      1.704090
    
    
      [25,15] relu
      1.704278
    
    
      [5,10] relu
      1.705194
    
    
      [2,2] None
      1.713283
    
    
      [5,5] tanh
      1.713575
    
    
      [1,10] tanh
      1.727484
    
    
      [10,30] sigmoid
      1.728322
    
    
      [25,25] sigmoid
      1.728527
    
    
      [10,10] relu
      1.730568
    
    
      [5,20] sigmoid
      1.731878
    
    
      [15,25] relu
      1.734847
    
    
      [1,10] None
      1.737496
    
    
      [5,10] None
      1.741946
    
    
      [10,10] None
      1.743810
    
    
      [10,5] sigmoid
      1.757676
    
    
      [10,30] None
      1.760914
    
    
      [5,15] sigmoid
      1.761239
    
    
      [5,25] sigmoid
      1.764270
    
    
      [15,30] None
      1.776063
    
    
      [2,2] relu
      1.776358
    
    
      [30,25] relu
      1.814700
    
    
      [5,5] sigmoid
      1.827547
    
    
      [2,2] tanh
      1.830345
    
    
      [2,2] sigmoid
      1.835663
    
    
      [1,10] sigmoid
      1.846257
    
    
      [10,1] tanh
      2.315263
    
    
      [10,1] sigmoid
      5.327368
    
  

156 rows × 1 columns

The best results came from a neural network with 25 nodes in the first layer, 15 nodes in the second layer, and a "tanh" activation function. The top ten to twenty networks or so had fairly similar results- all were likely within the testing accuracy of the network. Given these results, I would likely choose the "[20,10] tanh" network, given its relative simplicity.

Comparing these results to the previous blog post using the scikit-learn neural network module, I am surprised that the "relu" activation function performed so well. Notice that the worst networks are typically those with no activation function, or too few nodes in each hidden layer.

	MAE
[25,15] tanh	1.463747
[20,30] relu	1.471423
[20,20] relu	1.476413
[25,10] tanh	1.482112
[20,10] tanh	1.482269
[25,20] sigmoid	1.482779
[30,25] tanh	1.483761
[20,30] tanh	1.488986
[25,15] sigmoid	1.491202
[10,20] tanh	1.491449
[15,15] tanh	1.492220
[30,10] sigmoid	1.493012
[10,30] tanh	1.493243
[15,20] relu	1.493355
[30,25] sigmoid	1.495390
[5,20] tanh	1.495964
[20,10] sigmoid	1.496207
[30,15] tanh	1.496708
[30,30] sigmoid	1.499643
[25,30] relu	1.500617
[25,25] tanh	1.502365
[5,25] tanh	1.502596
[30,30] relu	1.502638
[20,5] relu	1.503448
[30,5] sigmoid	1.504090
[20,30] sigmoid	1.504335
[20,15] relu	1.506200
[30,10] tanh	1.506404
[10,15] tanh	1.506666
[15,15] sigmoid	1.508139
...	...
[30,30] None	1.688278
[1,10] relu	1.689282
[15,25] None	1.701531
[30,20] tanh	1.704090
[25,15] relu	1.704278
[5,10] relu	1.705194
[2,2] None	1.713283
[5,5] tanh	1.713575
[1,10] tanh	1.727484
[10,30] sigmoid	1.728322
[25,25] sigmoid	1.728527
[10,10] relu	1.730568
[5,20] sigmoid	1.731878
[15,25] relu	1.734847
[1,10] None	1.737496
[5,10] None	1.741946
[10,10] None	1.743810
[10,5] sigmoid	1.757676
[10,30] None	1.760914
[5,15] sigmoid	1.761239
[5,25] sigmoid	1.764270
[15,30] None	1.776063
[2,2] relu	1.776358
[30,25] relu	1.814700
[5,5] sigmoid	1.827547
[2,2] tanh	1.830345
[2,2] sigmoid	1.835663
[1,10] sigmoid	1.846257
[10,1] tanh	2.315263
[10,1] sigmoid	5.327368