In this notebook we will play with Feed-Forward FC-NN (Fully Connected Neural Network) for a classification task:
Image Classification on MNIST Dataset
RECALL
In the FC-NN, the output of each layer is computed using the activations from the previous one, as follows:
$$h_{i} = \sigma(W_i h_{i-1} + b_i)$$where ${h}_i$ is the activation vector from the $i$-th layer (or the input data for $i=0$), ${W}_i$ and ${b}_i$ are the weight matrix and the bias vector for the $i$-th layer, respectively.
To regularize the model, we will also insert a Dropout layer between consecutive hidden layers.
Dropout works by “dropping out” some unit activations in a given layer, that is setting them to zero with a given probability.
Our loss function will be the categorical crossentropy.
Keras supports two different kind of models: the Sequential model and the Graph model. The former is used to build linear stacks of layer (so each layer has one input and one output), and the latter supports any kind of connection graph.
In our case we build a Sequential model with three Dense (aka fully connected) layers, with some Dropout. Notice that the output layer has the softmax activation function.
The resulting model is actually a function
of its own inputs implemented using the Keras backend.
We apply the binary crossentropy loss and choose SGD as the optimizer.
Please remind that Keras supports a variety of different optimizers and loss functions, which you may want to check out.
In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
The ReLu function is defined as $f(x) = \max(0, x),$ [1]
A smooth approximation to the rectifier is the analytic function: $f(x) = \ln(1 + e^x)$
which is called the softplus function.
The derivative of softplus is $f'(x) = e^x / (e^x + 1) = 1 / (1 + e^{-x})$, i.e. the logistic function.
[1] http://www.cs.toronto.edu/~fritz/absps/reluICML.pdf by G. E. Hinton
In [2]:
from keras.models import Sequential
from keras.layers.core import Dense
from keras.optimizers import SGD
nb_classes = 10
# FC@512+relu -> FC@512+relu -> FC@nb_classes+softmax
# ... your Code Here
In [3]:
# %load ../solutions/sol_321.py
In [4]:
from keras.models import Sequential
from keras.layers.core import Dense
from keras.optimizers import SGD
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.001),
metrics=['accuracy'])
keras.dataset
)We will train our model on the MNIST dataset, which consists of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images.
Since this dataset is provided with Keras, we just ask the keras.dataset
model for training and test data.
We will:
The binary_crossentropy
loss expects a one-hot-vector as input, therefore we apply the to_categorical
function from keras.utilis
to convert integer labels to one-hot-vectors.
In [5]:
from keras.datasets import mnist
from keras.utils import np_utils
(X_train, y_train), (X_test, y_test) = mnist.load_data()
In [6]:
X_train.shape
Out[6]:
In [7]:
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype("float32")
X_test = X_test.astype("float32")
# Put everything on grayscale
X_train /= 255
X_test /= 255
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
In [8]:
from sklearn.model_selection import train_test_split
X_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train)
In [9]:
X_train[0].shape
Out[9]:
In [10]:
plt.imshow(X_train[0].reshape(28, 28))
Out[10]:
In [11]:
print(np.asarray(range(10)))
print(Y_train[0].astype('int'))
In [12]:
plt.imshow(X_val[0].reshape(28, 28))
Out[12]:
In [13]:
print(np.asarray(range(10)))
print(Y_val[0].astype('int'))
In [14]:
network_history = model.fit(X_train, Y_train, batch_size=128,
epochs=2, verbose=1, validation_data=(X_val, Y_val))
In [15]:
import matplotlib.pyplot as plt
%matplotlib inline
def plot_history(network_history):
plt.figure()
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.plot(network_history.history['loss'])
plt.plot(network_history.history['val_loss'])
plt.legend(['Training', 'Validation'])
plt.figure()
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.plot(network_history.history['acc'])
plt.plot(network_history.history['val_acc'])
plt.legend(['Training', 'Validation'], loc='lower right')
plt.show()
plot_history(network_history)
After 2
epochs, we get a ~88%
validation accuracy.
In [21]:
# Your code here
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.001),
metrics=['accuracy'])
network_history = model.fit(X_train, Y_train, batch_size=128,
epochs=2, verbose=1, validation_data=(X_val, Y_val))
The dropout layers have the very specific function to drop out a random set of activations in that layers by setting them to zero in the forward pass. Simple as that.
It allows to avoid overfitting but has to be used only at training time and not at test time.
keras.layers.core.Dropout(rate, noise_shape=None, seed=None)
Applies Dropout to the input.
Dropout consists in randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent overfitting.
Arguments
Note Keras guarantess automatically that this layer is not used in Inference (i.e. Prediction) phase (thus only used in training as it should be!)
See keras.backend.in_train_phase
function
In [3]:
from keras.layers.core import Dropout
## Pls note **where** the `K.in_train_phase` is actually called!!
Dropout??
In [2]:
from keras import backend as K
K.in_train_phase?
In [14]:
from keras.layers.core import Dropout
# FC@512+relu -> DropOut(0.2) -> FC@512+relu -> DropOut(0.2) -> FC@nb_classes+softmax
# ... your Code Here
In [ ]:
# %load ../solutions/sol_312.py
In [14]:
network_history = model.fit(X_train, Y_train, batch_size=128,
epochs=4, verbose=1, validation_data=(X_val, Y_val))
plot_history(network_history)
It is always necessary to monitor training and validation loss during the training of any kind of Neural Network, either to detect overfitting or to evaluate the behaviour of the model (any clue on how to do it??)
In [ ]:
# %load solutions/sol23.py
from keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', patience=4, verbose=1)
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=SGD(),
metrics=['accuracy'])
model.fit(X_train, Y_train, validation_data = (X_test, Y_test), epochs=100,
batch_size=128, verbose=True, callbacks=[early_stop])
In [15]:
# We already used `summary`
model.summary()
In [16]:
print('Model Input Tensors: ', model.input, end='\n\n')
print('Layers - Network Configuration:', end='\n\n')
for layer in model.layers:
print(layer.name, layer.trainable)
print('Layer Configuration:')
print(layer.get_config(), end='\n{}\n'.format('----'*10))
print('Model Output Tensors: ', model.output)
One simple way to do it is to use the weights of your model to build a new model that's truncated at the layer you want to read.
Then you can run the ._predict(X_batch)
method to get the activations for a batch of inputs.
In [17]:
model_truncated = Sequential()
model_truncated.add(Dense(512, activation='relu', input_shape=(784,)))
model_truncated.add(Dropout(0.2))
model_truncated.add(Dense(512, activation='relu'))
for i, layer in enumerate(model_truncated.layers):
layer.set_weights(model.layers[i].get_weights())
model_truncated.compile(loss='categorical_crossentropy', optimizer=SGD(),
metrics=['accuracy'])
In [18]:
# Check
np.all(model_truncated.layers[0].get_weights()[0] == model.layers[0].get_weights()[0])
Out[18]:
In [19]:
hidden_features = model_truncated.predict(X_train)
In [20]:
hidden_features.shape
Out[20]:
In [21]:
X_train.shape
Out[21]:
def get_activations(model, layer, X_batch):
activations_f = K.function([model.layers[0].input, K.learning_phase()], [layer.output,])
activations = activations_f((X_batch, False))
return activations
In [24]:
from sklearn.manifold import TSNE
tsne = TSNE(n_components=2)
X_tsne = tsne.fit_transform(hidden_features[:1000]) ## Reduced for computational issues
In [29]:
colors_map = np.argmax(Y_train, axis=1)
In [32]:
X_tsne.shape
Out[32]:
In [49]:
nb_classes
Out[49]:
In [53]:
np.where(colors_map==6)
Out[53]:
In [55]:
colors = np.array([x for x in 'b-g-r-c-m-y-k-purple-coral-lime'.split('-')])
colors_map = colors_map[:1000]
plt.figure(figsize=(10,10))
for cl in range(nb_classes):
indices = np.where(colors_map==cl)
plt.scatter(X_tsne[indices,0], X_tsne[indices, 1], c=colors[cl], label=cl)
plt.legend()
plt.show()
In [67]:
from bokeh.plotting import figure, output_notebook, show
output_notebook()
In [74]:
p = figure(plot_width=600, plot_height=600)
colors = [x for x in 'blue-green-red-cyan-magenta-yellow-black-purple-coral-lime'.split('-')]
colors_map = colors_map[:1000]
for cl in range(nb_classes):
indices = np.where(colors_map==cl)
p.circle(X_tsne[indices, 0].ravel(), X_tsne[indices, 1].ravel(), size=7,
color=colors[cl], alpha=0.4, legend=str(cl))
# show the results
p.legend.location = 'bottom_right'
show(p)
In [75]:
from sklearn.manifold import MDS
In [ ]:
## Your code here
In [ ]:
## Your code here
In [2]:
## Try using the `get_activations` function relying on keras backend
def get_activations(model, layer, X_batch):
activations_f = K.function([model.layers[0].input, K.learning_phase()], [layer.output,])
activations = activations_f((X_batch, False))
return activations
In [ ]: