tensorflow
and the Keras layers
In [ ]:
%matplotlib inline
# display figures in the notebook
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_digits
digits = load_digits()
In [ ]:
sample_index = 45
plt.figure(figsize=(3, 3))
plt.imshow(digits.images[sample_index], cmap=plt.cm.gray_r,
interpolation='nearest')
plt.title("image label: %d" % digits.target[sample_index]);
In [ ]:
from sklearn.model_selection import train_test_split
data = np.asarray(digits.data, dtype='float32')
target = np.asarray(digits.target, dtype='int32')
X_train, X_test, y_train, y_test = train_test_split(
data, target, test_size=0.15, random_state=37)
In [ ]:
from sklearn import preprocessing
# mean = 0 ; standard deviation = 1.0
scaler = preprocessing.StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# print(scaler.mean_)
# print(scaler.scale_)
Let's display the one of the transformed sample (after feature standardization):
In [ ]:
sample_index = 45
plt.figure(figsize=(3, 3))
plt.imshow(X_train[sample_index].reshape(8, 8),
cmap=plt.cm.gray_r, interpolation='nearest')
plt.title("transformed sample\n(standardization)");
The scaler objects makes it possible to recover the original sample:
In [ ]:
plt.figure(figsize=(3, 3))
plt.imshow(scaler.inverse_transform(X_train[sample_index]).reshape(8, 8),
cmap=plt.cm.gray_r, interpolation='nearest')
plt.title("original sample");
In [ ]:
print(X_train.shape, y_train.shape)
In [ ]:
print(X_test.shape, y_test.shape)
In [ ]:
y_train[:3]
Keras provides a utility function to convert integer-encoded categorical variables as one-hot encoded values:
In [ ]:
from tensorflow.keras.utils import to_categorical
Y_train = to_categorical(y_train)
Y_train[:3]
We can now build an train a our first feed forward neural network using the high level API from keras:
In [ ]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras import optimizers
input_dim = X_train.shape[1]
hidden_dim = 100
output_dim = 10
model = Sequential()
model.add(Dense(hidden_dim, input_dim=input_dim, activation="tanh"))
model.add(Dense(output_dim, activation="softmax"))
model.compile(optimizer=optimizers.SGD(lr=0.1),
loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(X_train, Y_train, validation_split=0.2, epochs=15, batch_size=32)
In [ ]:
history.history
In [ ]:
history.epoch
Let's wrap this into a pandas dataframe for easier plotting:
In [ ]:
import pandas as pd
history_df = pd.DataFrame(history.history)
history_df["epoch"] = history.epoch
history_df
In [ ]:
fig, (ax0, ax1) = plt.subplots(nrows=2, sharex=True, figsize=(12, 6))
history_df.plot(x="epoch", y=["loss", "val_loss"], ax=ax0)
history_df.plot(x="epoch", y=["accuracy", "val_accuracy"], ax=ax1);
In [ ]:
%load_ext tensorboard
In [ ]:
!rm -rf tensorboard_logs
In [ ]:
import datetime
from tensorflow.keras.callbacks import TensorBoard
model = Sequential()
model.add(Dense(hidden_dim, input_dim=input_dim, activation="tanh"))
model.add(Dense(output_dim, activation="softmax"))
model.compile(optimizer=optimizers.SGD(lr=0.1),
loss='categorical_crossentropy', metrics=['accuracy'])
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
log_dir = "tensorboard_logs/" + timestamp
tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)
model.fit(x=X_train, y=Y_train, validation_split=0.2, epochs=15,
callbacks=[tensorboard_callback]);
In [ ]:
%tensorboard --logdir tensorboard_logs
Try to decrease the learning rate value by 10 or 100. What do you observe?
Try to increase the learning rate value to make the optimization diverge.
Configure the SGD optimizer to enable a Nesterov momentum of 0.9
Notes:
The keras API documentation is available at:
https://www.tensorflow.org/api_docs/python/tf/keras
It is also possible to learn more about the parameters of a class by using the question mark: type and evaluate:
optimizers.SGD?
in a jupyter notebook cell.
It is also possible to type the beginning of a function call / constructor and type "shift-tab" after the opening paren:
optimizers.SGD(<shiff-tab>
In [ ]:
optimizers.SGD?
In [ ]:
In [ ]:
# %load solutions/keras_sgd_and_momentum.py
Replace the SGD optimizer by the Adam optimizer from keras and run it with the default parameters.
Hint: use optimizers.<TAB>
to tab-complete the list of implemented optimizers in Keras.
Add another hidden layer and use the "Rectified Linear Unit" for each hidden layer. Can you still train the model with Adam with its default global learning rate?
In [ ]:
In [ ]:
# %load solutions/keras_adam.py
In [ ]:
In [ ]:
# %load solutions/keras_accuracy_on_test_set.py
In [ ]:
predicted_labels_numpy = model.predict_classes(X_test)
predicted_labels_numpy
In [ ]:
type(predicted_labels_numpy), predicted_labels_numpy.shape
Alternatively one can directly call the model on the data to get the laster layer (softmax) outputs directly as a tensorflow Tensor:
In [ ]:
predictions_tf = model(X_test)
predictions_tf[:5]
In [ ]:
type(predictions_tf), predictions_tf.shape
We can use the tensorflow API to check that for each row, the probabilities sum to 1:
In [ ]:
import tensorflow as tf
tf.reduce_sum(predictions_tf, axis=1)[:5]
We can also extract the label with the highest probability using the tensorflow API:
In [ ]:
predicted_labels_tf = tf.argmax(predictions_tf, axis=1)
predicted_labels_tf[:5]
We can compare those labels to the expected labels to compute the accuracy with the Tensorflow API. Note however that we need an explicit cast from boolean to floating point values to be able to compute the mean accuracy when using the tensorflow tensors:
In [ ]:
accuracy_tf = tf.reduce_mean(tf.cast(predicted_labels_tf == y_test, tf.float64))
accuracy_tf
Also note that it is possible to convert tensors to numpy array if one prefer to use numpy:
In [ ]:
accuracy_tf.numpy()
In [ ]:
predicted_labels_tf[:5]
In [ ]:
predicted_labels_tf.numpy()[:5]
In [ ]:
(predicted_labels_tf.numpy() == y_test).mean()
Let us now study the impact of a bad initialization when training a deep feed forward network.
By default Keras dense layers use the "Glorot Uniform" initialization strategy to initialize the weight matrices:
This strategy is known to work well to initialize deep neural networks with "tanh" or "relu" activation functions and then trained with standard SGD.
To assess the impact of initialization let us plug an alternative init scheme into a 2 hidden layers networks with "tanh" activations. For the sake of the example let's use normal distributed weights with a manually adjustable scale (standard deviation) and see the impact the scale value:
In [ ]:
from tensorflow.keras import initializers
normal_init = initializers.TruncatedNormal(stddev=0.01)
model = Sequential()
model.add(Dense(hidden_dim, input_dim=input_dim, activation="tanh",
kernel_initializer=normal_init))
model.add(Dense(hidden_dim, activation="tanh",
kernel_initializer=normal_init))
model.add(Dense(output_dim, activation="softmax",
kernel_initializer=normal_init))
model.compile(optimizer=optimizers.SGD(lr=0.1),
loss='categorical_crossentropy', metrics=['accuracy'])
In [ ]:
model.layers
Let's have a look at the parameters of the first layer after initialization but before any training has happened:
In [ ]:
model.layers[0].weights
In [ ]:
w = model.layers[0].weights[0].numpy()
w
In [ ]:
w.std()
In [ ]:
b = model.layers[0].weights[1].numpy()
b
In [ ]:
history = model.fit(X_train, Y_train, epochs=15, batch_size=32)
plt.figure(figsize=(12, 4))
plt.plot(history.history['loss'], label="Truncated Normal init")
plt.legend();
Once the model has been fit, the weights have been updated and notably the biases are no longer 0:
In [ ]:
model.layers[0].weights
Try the following initialization schemes and see whether the SGD algorithm can successfully train the network or not:
stddev=1e-3
stddev=1
or 10
What do you observe? Can you find an explanation for those outcomes?
Are more advanced solvers such as SGD with momentum or Adam able to deal better with such bad initializations?
In [ ]:
In [ ]:
# %load solutions/keras_initializations.py
In [ ]:
# %load solutions/keras_initializations_analysis.py