Here we only focus on feed-forward networks. The feed-forward neural networks resemble non-linear regression. Neural networks is composed of hidden layers (projection), and each hidden layer contain several neurons (projection dimension).
As the number of neurons in each layer increase, the network becomes wider. As the number of hidden layers increase, the network becomes deeper.
As a simple starting example, remember the bivariate classification on simulated data of Chapter 3. Here the matrix $X$ is the predictor and $Y$ is a binary classification indicator. In Chapter 3 we used SVM to classify the data. This time we try a neural networks. In fitting neural networks there are several steps to follow
In [1]:
# load required libraries
import numpy as np
import keras
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
%matplotlib inline
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, Flatten
Start simulation, X is multivariate Gaussian, Y is a binary variable 0 or 1.
In [2]:
np.random.seed(0)
n = 100
X = np.vstack((np.random.multivariate_normal([0, 0], [[1, 0], [0,1]], n),
np.random.multivariate_normal([3, 3], [[1, 0], [0, 1]], n)))
Y = np.array([0] * n + [1] * n)
keras library, requires Y to be one hot coded, i.e. $(0,1)$ for one class, and $(1,0)$ for another class.
In [3]:
# convert integer Y to dummy variable Y (i.e. one hot encoded)
Y = keras.utils.np_utils.to_categorical(Y)
Now divide data into training and validation sets. This data partition can provide a warning in the case of overfitting.
In [4]:
# divide data into train and validation 80% 20%
X_train, X_valid, Y_train, Y_valid = train_test_split(X, Y, test_size = 0.2, random_state = 150)
Defining a model is simple. You must make sure what is the input dimension and what is the output dimension, then feed the data in.
Here we start with a very simple model, a single projection over a single dimension. This makes the neural model equivalent to the logistic regression.
In [5]:
# define empty model
model1 = Sequential()
# specify the form of input and the form of output
model1.add(Dense(units = 2, input_dim = 2, activation = 'softmax'))
# input_dim is only required for the first layer
# units must match the form of the input for the next layer, here next layer is the output and is binary
model1.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
print(model1.summary())
Now we must feed the data into the constructed model. The data goes to the model by batch and model is fit using stochastic gradient descent. The batch size and epoch number are important parameters.
In [6]:
# train on the training data
fit1 = model1.fit(X_train, Y_train, validation_data=(X_valid, Y_valid), epochs = 20, batch_size = 64)
Here we just define a simple function to visualize the trend of accuracy gain in each epoch. When the dataset by batch is fed until all data are fed, epoch becomes 1. Then again a random sample by batch are fed until all data are fed and epoch becomes 2. Depending on a random start, the result of training might be different. It is a good idea to try fitting the model several times to see if the accuracy changes drammatically.
In [7]:
def accplot (fit, model, xtrain, ytrain, xvalid, yvalid, ylim=[0, 1]):
print("Final Validation Accuracy: %.2f%%" % (model.evaluate(xvalid, yvalid, verbose=0)[1] * 100))
print("Final Training Accuracy: %.2f%%" % (model.evaluate(xtrain, ytrain, verbose=0)[1] * 100))
val_acc, = plt.plot(fit.history["val_acc"], label='Validation Accuracy', color = "blue")
train_acc, = plt.plot(fit.history["acc"], label='Training Accuracy', color = "black")
plt.title("Model Accuracy vs Epochs")
plt.legend(loc="upper left")
plt.ylim(ymin = ylim[0], ymax = ylim[1])
plt.ylabel("Accuracy")
plt.xlabel("Epochs")
plt.show()
def lossplot (fit, model, xtrain, ytrain, xvalid, yvalid):
print("Final Validation Accuracy: %.2f%%" % (model.evaluate(xvalid, yvalid, verbose=0)[0] * 100))
print("Final Training Accuracy: %.2f%%" % (model.evaluate(xtrain, ytrain, verbose=0)[0] * 100))
val_loss, = plt.plot(fit.history["val_loss"], label='Validation Loss', color = "blue")
train_loss, = plt.plot(fit.history["loss"], label='Training Loss', color = "black")
plt.title("Loss vs Epochs")
plt.legend(loc="upper left")
plt.ylabel("Loss")
plt.xlabel("Epochs")
plt.show()
In [8]:
accplot(fit1, model1, X_train, Y_train, X_valid, Y_valid)
In [9]:
lossplot(fit1, model1, X_train, Y_train, X_valid, Y_valid)
Now that you trained a simple model, let's make the network deeper and wider.
In [10]:
# Now a deeper and a wider model
model2 = Sequential()
model2.add(Dense(units = 100, input_dim = 2, activation = 'sigmoid'))
model2.add(Dense(units = 100, activation='sigmoid'))
model2.add(Dense(units = 100, activation='sigmoid'))
model2.add(Dense(units = 100, activation='sigmoid'))
model2.add(Dense(units = 2, activation='softmax'))
model2.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
print(model1.summary())
In [11]:
fit2 = model2.fit(X_train, Y_train, validation_data=(X_valid, Y_valid), epochs=20, batch_size=64)
In [12]:
accplot(fit2, model2, X_train, Y_train, X_valid, Y_valid)
In [13]:
lossplot(fit2, model2, X_train, Y_train, X_valid, Y_valid)
In [14]:
zipdata = np.loadtxt("../data/zip.train")
X=zipdata[:, 1:]
Y=zipdata[:, 0]
# one hot encoding of Y
Y = keras.utils.np_utils.to_categorical(Y)
In [15]:
X_train, X_valid, Y_train, Y_valid = train_test_split(X, Y, test_size = 0.2, random_state = 150)
In [16]:
# multi-logit regression
model3 = Sequential()
model3.add(Dense(units = 10, input_dim = 256, activation = 'softmax'))
model3.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
print(model3.summary())
In [17]:
fit3 = model3.fit(X_train, Y_train, validation_data = (X_valid, Y_valid), epochs = 20, batch_size = 64)
In [18]:
accplot(fit3, model3, X_train, Y_train, X_valid, Y_valid)
The sigmoid activation function is hard to use in practice. The relu is faster and often enjoys a better convergence properties, and also is more popular. A fully connected model for zip data is visualized http://scs.ryerson.ca/~aharley/vis/fc/
In [19]:
# multi-logit regression
model4 = Sequential()
model4.add(Dense(units = 500, input_dim = 256, activation = 'relu')) #layer 01
model4.add(Dense(units = 500, input_dim = 256, activation = 'relu')) #layer 02
model4.add(Dense(units = 500, input_dim = 256, activation = 'relu')) #layer 03
model4.add(Dense(units = 500, input_dim = 256, activation = 'relu')) #layer 04
model4.add(Dense(units = 500, input_dim = 256, activation = 'relu')) #layer 05
model4.add(Dense(units = 10, activation = 'softmax')) # layer output
model4.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model4.summary())
In [20]:
fit4 = model4.fit(X_train, Y_train, validation_data = (X_valid, Y_valid), epochs = 20, batch_size = 64)
In [21]:
accplot(fit4, model4, X_train, Y_train, X_valid, Y_valid)
In [22]:
lossplot(fit4, model4, X_train, Y_train, X_valid, Y_valid)
In previous models the spatial structure of image was ignored. Convolutional network is a building-block of image (and text) modeling. First 256 pixel data must be re-formatted as 16X16 image as in Chapter 4. Then the re-structured data are fed into the model in batches as before. The convulitional model for the zip data is visualized here http://scs.ryerson.ca/~aharley/vis/conv/
In [23]:
# re-structure the data into images 16X16 images
X_train2D = np.zeros(shape = [X_train.shape[0], 16, 16, 1])
X_valid2D = np.zeros(shape = [X_valid.shape[0], 16, 16, 1])
for i in range(X_train.shape[0]):
X_train2D[i,] = X_train[i, ].reshape(16, 16, 1)
for i in range(X_valid.shape[0]):
X_valid2D[i,] = X_valid[i, ].reshape(16, 16, 1)
In [24]:
[X_train.shape[0], 16, 16, 1]
Out[24]:
Now we try to build two layers of convolutions. Often a convolutional layer is attached to a max-pooling layer. At the top layer, often a fully-connected layer is attached to the max-pooling layer (right before the output layer), to combine the convolution information.
In [25]:
model5 = Sequential()
model5.add(Conv2D(filters = 8, kernel_size = [6,6], padding = 'same',
input_shape = (16, 16, 1), activation = 'relu'))
model5.add(MaxPooling2D(pool_size=2))
model5.add(Conv2D(filters = 16, kernel_size = [4,4], padding = 'same',
input_shape = (16, 16, 1), activation = 'relu'))
model5.add(MaxPooling2D(pool_size = 4))
model5.add(Flatten())
model5.add(Dense(units = 256, activation = 'relu')) # layer output
model5.add(Dense(units = 10, activation = 'softmax')) # layer output
model5.compile(loss='categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
In [26]:
print(model5.summary())
In [27]:
fit5 = model5.fit(X_train2D, Y_train, validation_data=(X_valid2D, Y_valid),
epochs = 20, batch_size = 64)
In [28]:
accplot(fit5, model5, X_train2D, Y_train, X_valid2D, Y_valid)
In [29]:
lossplot(fit5, model5, X_train2D, Y_train, X_valid2D, Y_valid)
After fitting various models, the final model is checked using an independent set of data, called the test set. Choosing the best model according to the test set will result in a little bit of overfitting. The test set is only there to see how the model behaves in production mode.
First we must prepare the data. We need to prepare once for fully-connected models that only receives vector X (1X256) as predictors, and another data that receives a matrix 16X16 as predictors. The categorical reponse variable Y, like the training section must be one hot coded.
In [30]:
zipdata = np.loadtxt("../data/zip.test")
# divide data into preditor X_test, and response Y_test
X_test = zipdata[:,1:]
Y_test = zipdata[:,0]
# one hot encoding of Y_test
Y_test = keras.utils.np_utils.to_categorical(Y_test)
# reshape X_test for convolutional model
X_test2D = np.zeros(shape = [X_test.shape[0], 16, 16, 1])
for i in range(X_test.shape[0]):
X_test2D[i, ] = X_test[i, ].reshape(16, 16, 1)
In [31]:
# check the accuracy of multilogit (no hidden layer) model
score3 = model3.evaluate(X_test, Y_test, verbose = 0)
# check the accuracy of wide and deep fully-connected model
score4 = model4.evaluate(X_test, Y_test, verbose = 0)
# check the accuracy of convolutional model
score5 = model5.evaluate(X_test2D, Y_test, verbose = 0)
In [32]:
print score3[1], score4[1], score5[1]