In this notebook, I will be varying the architecture of the neural network, similar as to what was done in the scikit-learn version of this analysis.
First, the data is loaded and preprocessed as before:
In [1]:
# Data preprocessing from Part 1
import datetime
import pandas as pd
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense
abalone_df = pd.read_csv('abalone.csv',names=['Sex','Length','Diameter','Height',
'Whole Weight','Shucked Weight', 'Viscera Weight','Shell Weight', 'Rings'])
abalone_df['Male'] = (abalone_df['Sex']=='M').astype(int)
abalone_df['Female'] = (abalone_df['Sex']=='F').astype(int)
abalone_df['Infant'] = (abalone_df['Sex']=='I').astype(int)
abalone_df = abalone_df[abalone_df['Height']>0]
train, test = train_test_split(abalone_df, train_size=0.7)
x_train = train.drop(['Rings','Sex'], axis=1).values
y_train = pd.DataFrame(train['Rings']).values
x_test = test.drop(['Rings','Sex'], axis=1).values
y_test = pd.DataFrame(test['Rings']).values
To automatically construct a list of models to test, I used a listcomp in the following code. Each of these models requires exactly two hidden layers. The idea is that I will test a range of values from 5 to 30 for each hidden layer, incrementing by 5 each time. I also extended the list with three additional models that I thought would be interesting to test as well.
In [2]:
# Constructing a list of models to test
hlayers = [[x,y] for x in range(5,31,5) for y in range(5,31,5)]
hlayers.extend([[1,10],[10,1],[2,2]])
Next, I will iterate over each of the above models, trying three different activation functions: 'tanh', 'relu', and 'sigmoid'. A model with no activation function will also be tried. The same model-building procedure as in Part 1 of this blog series will be used.
In [3]:
# Iterate over the list of models, trying 3 different activation functions
begin = datetime.datetime.now()
results_dict = {}
for act in [None, 'tanh', 'relu', 'sigmoid']:
for layers in hlayers:
abalone_model = Sequential([
Dense(layers[0], input_dim=10),
Dense(layers[1], activation=act),
Dense(1)])
abalone_model.compile(optimizer='rmsprop',loss='mse',metrics=["mean_absolute_error"])
results = abalone_model.fit(x_train, y_train, nb_epoch=50, verbose=0)
score = abalone_model.evaluate(x_test, y_test)
result_string = "[{},{}] {}".format(layers[0], layers[1], act)
results_dict[result_string] = score[1]
# Save the results in a DataFrame
results_df = pd.DataFrame.from_dict(results_dict, orient="index")
results_df.rename(columns={0 : "MAE"}, inplace=True)
seconds = (datetime.datetime.now() - begin).total_seconds()
sec_string = "Total elapsed seconds: {}".format(seconds)
print(sec_string)
Notice that the total elapsed time was about 30 minutes. I'm running this code on a fairly old Linux computer. I have no doubt that it would run faster with better hardware. I would like to try out GPU acceleration, but I don't currently have a computer with an CUDA-enabled GPU, so that will have to wait.
Finally, the following code will present the results in tabular form:
In [4]:
# Print the results matrix
results_df.sort_values('MAE')
Out[4]:
The best results came from a neural network with 25 nodes in the first layer, 15 nodes in the second layer, and a "tanh" activation function. The top ten to twenty networks or so had fairly similar results- all were likely within the testing accuracy of the network. Given these results, I would likely choose the "[20,10] tanh" network, given its relative simplicity.
Comparing these results to the previous blog post using the scikit-learn neural network module, I am surprised that the "relu" activation function performed so well. Notice that the worst networks are typically those with no activation function, or too few nodes in each hidden layer.