Introducing Keras

In the next cell, we introduce Keras, a high-level library for machine learning which we will use for the rest of the class. Keras is built on top of Tensorflow, which is an open-source framework which impelments machine learning methodology, particularly that of deep neural networks, by optimizing the efficiency of the computation. We do not have to deal so much with the details of this. For our purposes, Tensorflow is also a very low-level library which is not necessarily accessible to the typical engineer. Keras solves this by creating a wrapper around Tensorflow, reducing the complexity of coding neural networks, and giving us a set of convenient functions which implement lots of reusable routines. Most importantly, Keras (via Tensorflow) efficiently implement backpropagation to train neural networks on the GPU. Effectively, you could say that Keras is to Tensorflow what Processing is to Java.

To start, we will re-implement what we did in the last section, a neural network to classify the Iris dataset, but this time we will use Keras.

Start by importing the relevant Keras libraries that we will be using, as well as matplotlib and numpy.


In [46]:
import os
import matplotlib.pyplot as plt
import numpy as np
import random

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout

Let's load the Iris dataset again.


In [47]:
from sklearn.datasets import load_iris

iris = load_iris()
data, labels = iris.data[:,0:3], iris.data[:,3]

In the last lesson, we manually trained a neural network to predict the sepal width of the Iris flowers. This time, let's use the Keras library instead. First we need to shuffle and pre-process the data. Pre-processing in this case is normalization of the data, as well as converting it to a properly-shaped numpy array.


In [48]:
num_samples = len(labels)  # size of our dataset
shuffle_order = np.random.permutation(num_samples)
data = data[shuffle_order, :]
labels = labels[shuffle_order]

# normalize data and labels to between 0 and 1 and make sure it's float32
data = data / np.amax(data, axis=0)
data = data.astype('float32')
labels = labels / np.amax(labels, axis=0)
labels = labels.astype('float32')

# print out the data
print("shape of X", data.shape)
print("first 5 rows of X\n", data[0:5, :])
print("first 5 labels\n", labels[0:5])


shape of X (150, 3)
first 5 rows of X
 [[0.79746836 0.77272725 0.8115942 ]
 [0.6962025  0.5681818  0.5797101 ]
 [0.96202534 0.6818182  0.95652175]
 [0.8101266  0.70454544 0.79710144]
 [0.5949367  0.72727275 0.23188406]]
first 5 labels
 [0.96 0.52 0.84 0.72 0.08]

Overfitting and validation

In our previous guides, we always evaluated the performance of the network on the same data that we trained it on. But this is wrong; our network could learn to "cheat" by overfitting to the training data (like memorizing it) so as to get a high score, but then not generalize well to actually unknown examples.

In machine learning, this is called "overfitting" and there are several things we have to do to avoid it. The first thing is we must split our dataset into a "training set" which we train on with gradient descent, and a "test set" which is hidden from the training process that we can do a final evaluation on to get the true accuracy, that of the network trying to predict unknown samples.

Let's split the data into a training set and a test set. We'll keep the first 30% of the dataset to use as a test set, and use the rest for training.


In [49]:
# let's rename the data and labels to X, y
X, y = data, labels

test_split = 0.3  # percent split

n_test = int(test_split * num_samples)

x_train, x_test = X[n_test:, :], X[:n_test, :] 
y_train, y_test = y[n_test:], y[:n_test] 

print('%d training samples, %d test samples' % (x_train.shape[0], x_test.shape[0]))


105 training samples, 45 test samples

In Keras, to instantiate a neural network model, we use the Sequential class. Sequential simply means a model with a sequence of layers which propagate in one direction, from input to output.


In [141]:
model = Sequential()

We now have an empty neural network called model. Now let's add our first layer, which will be our input layer. We will do this using Keras's Dense class which will instantiate our input layer.

The reason why it is called "Dense" is that the layer is "fully-connected," which means that all of it's neurons are connected to all the neurons in the previous layer, with no empty connections. This may seem confusing at first because we have not yet seen neural network layers which are not fully-connected; we will see this in the next chapter when we introduce convolutional networks.

To create a Dense layer, we have two arguments that need to be specified: the number of neurons and the activation function (which non-linearity, if any). For the first layer, we must also specify the input dimension.


In [142]:
model.add(Dense(8, activation='sigmoid', input_dim=3))

We can also get a readout of the current state of the network using model.summary:


In [143]:
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_16 (Dense)             (None, 8)                 32        
=================================================================
Total params: 32
Trainable params: 32
Non-trainable params: 0
_________________________________________________________________

Our network currently has one layer with 32 parameters: that's 3 neurons in the input layer, times 8 neurons in the middle layer (3x8=24), plus 8 biases (24+8=32).

Next, we will add the output layer, which will be a fully-connected (Dense) layer whose size is 1 neuron. This neuron will contain our final output.

Notice that this time, instead of having the activation be a sigmoid as before, we leave it as a "linear" activation (no non-linearity). This is common for the final output.

We add it, and look at the final summary.


In [144]:
model.add(Dense(1, activation='linear'))
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_16 (Dense)             (None, 8)                 32        
_________________________________________________________________
dense_17 (Dense)             (None, 1)                 9         
=================================================================
Total params: 41
Trainable params: 41
Non-trainable params: 0
_________________________________________________________________

So we've added 9 parameters, 8x1 weights between the hidden and output layers, and 1 bias in the output. So we have 41 parameters in total.

Now we are finished specifying the architecture of the model. Now we need to specify our loss function and optimizer, and then compile the model. Let's discuss each of these things.

First, we specify the loss. The standard for regression, as we said before is sum-squared error (SSE) or mean-squared error (MSE). SSE and MSE are basically the same, since the only difference between them is a scaling factor ($\frac{1}{n}$) which doesn't depend on the final weights. Keras happens to use MSE for evaluation rather than SEE, so we will use that.

The optimizer is the flavor of gradient descent we want. The most basic optimizer is "stochastic gradient descent" or SGD which is the learning algorithm we have used so far. We have mostly used batch gradient descent so far, which means we compute our gradient over the entire dataset. For reasons which will be more clear when we cover learning algorithms in more detail, this is not usually favored, and we instead calculate the gradient over random subsets of the training data, called mini-batches.

Once we've specified our loss function and optimizer, the model is compiled. Compiling means that Keras (actually Tensorflow internally) is allocating memory for a "computational graph" whose architecture is that which is specified by your model definition. This is done for optimization purposes, and a full understanding of how that's done is not necessary for this course and is beyond its scope.


In [145]:
model.compile(loss='mean_squared_error', optimizer='sgd')

We are finally ready to train. In the next cell, we run the fit command which will begin the process of training. There are several important arguments to fit. The first is the training data and labels (x_train and y_train), as well as the validation set (x_test and y_test).

Additionally, we must specify the batch_size which refers to the number of training samples to calculate the gradient over (using SGD), as well as the number of epochs, which refers to the number of times we cycle through the training set. In general, more epochs are usually better, although in practice, the accuracy of the network may stop improving early, which makes it unnecessary to train for too many epochs.

Because we have a very small dataset (just 105 samples), we should have a low batch size and can afford to train over many epochs (let's set to 200).


In [146]:
history = model.fit(x_train, y_train,
                    batch_size=4,
                    epochs=200,
                    verbose=1,
                    validation_data=(x_test, y_test))


Train on 105 samples, validate on 45 samples
Epoch 1/200
105/105 [==============================] - 0s 2ms/step - loss: 0.1224 - val_loss: 0.1100
Epoch 2/200
105/105 [==============================] - 0s 310us/step - loss: 0.0996 - val_loss: 0.1084
Epoch 3/200
105/105 [==============================] - 0s 406us/step - loss: 0.0991 - val_loss: 0.1059
Epoch 4/200
105/105 [==============================] - 0s 307us/step - loss: 0.0966 - val_loss: 0.1050
Epoch 5/200
105/105 [==============================] - 0s 305us/step - loss: 0.0956 - val_loss: 0.1038
Epoch 6/200
105/105 [==============================] - 0s 323us/step - loss: 0.0945 - val_loss: 0.1023
Epoch 7/200
105/105 [==============================] - 0s 307us/step - loss: 0.0938 - val_loss: 0.1010
Epoch 8/200
105/105 [==============================] - 0s 314us/step - loss: 0.0922 - val_loss: 0.1000
Epoch 9/200
105/105 [==============================] - 0s 324us/step - loss: 0.0908 - val_loss: 0.0990
Epoch 10/200
105/105 [==============================] - 0s 310us/step - loss: 0.0900 - val_loss: 0.0975
Epoch 11/200
105/105 [==============================] - 0s 314us/step - loss: 0.0888 - val_loss: 0.0966
Epoch 12/200
105/105 [==============================] - 0s 416us/step - loss: 0.0880 - val_loss: 0.0957
Epoch 13/200
105/105 [==============================] - 0s 416us/step - loss: 0.0869 - val_loss: 0.0942
Epoch 14/200
105/105 [==============================] - 0s 351us/step - loss: 0.0857 - val_loss: 0.0930
Epoch 15/200
105/105 [==============================] - 0s 327us/step - loss: 0.0850 - val_loss: 0.0919
Epoch 16/200
105/105 [==============================] - 0s 327us/step - loss: 0.0841 - val_loss: 0.0916
Epoch 17/200
105/105 [==============================] - 0s 336us/step - loss: 0.0832 - val_loss: 0.0898
Epoch 18/200
105/105 [==============================] - 0s 335us/step - loss: 0.0818 - val_loss: 0.0891
Epoch 19/200
105/105 [==============================] - 0s 324us/step - loss: 0.0813 - val_loss: 0.0876
Epoch 20/200
105/105 [==============================] - 0s 332us/step - loss: 0.0797 - val_loss: 0.0874
Epoch 21/200
105/105 [==============================] - 0s 334us/step - loss: 0.0796 - val_loss: 0.0863
Epoch 22/200
105/105 [==============================] - 0s 364us/step - loss: 0.0783 - val_loss: 0.0854
Epoch 23/200
105/105 [==============================] - 0s 339us/step - loss: 0.0776 - val_loss: 0.0835
Epoch 24/200
105/105 [==============================] - 0s 360us/step - loss: 0.0761 - val_loss: 0.0825
Epoch 25/200
105/105 [==============================] - 0s 359us/step - loss: 0.0753 - val_loss: 0.0816
Epoch 26/200
105/105 [==============================] - 0s 340us/step - loss: 0.0741 - val_loss: 0.0810
Epoch 27/200
105/105 [==============================] - 0s 322us/step - loss: 0.0734 - val_loss: 0.0796
Epoch 28/200
105/105 [==============================] - 0s 364us/step - loss: 0.0725 - val_loss: 0.0787
Epoch 29/200
105/105 [==============================] - 0s 330us/step - loss: 0.0715 - val_loss: 0.0778
Epoch 30/200
105/105 [==============================] - 0s 339us/step - loss: 0.0712 - val_loss: 0.0768
Epoch 31/200
105/105 [==============================] - 0s 355us/step - loss: 0.0698 - val_loss: 0.0759
Epoch 32/200
105/105 [==============================] - 0s 333us/step - loss: 0.0693 - val_loss: 0.0752
Epoch 33/200
105/105 [==============================] - 0s 341us/step - loss: 0.0683 - val_loss: 0.0743
Epoch 34/200
105/105 [==============================] - 0s 349us/step - loss: 0.0674 - val_loss: 0.0731
Epoch 35/200
105/105 [==============================] - 0s 334us/step - loss: 0.0665 - val_loss: 0.0722
Epoch 36/200
105/105 [==============================] - 0s 350us/step - loss: 0.0655 - val_loss: 0.0714
Epoch 37/200
105/105 [==============================] - 0s 339us/step - loss: 0.0650 - val_loss: 0.0712
Epoch 38/200
105/105 [==============================] - 0s 362us/step - loss: 0.0641 - val_loss: 0.0698
Epoch 39/200
105/105 [==============================] - 0s 381us/step - loss: 0.0631 - val_loss: 0.0688
Epoch 40/200
105/105 [==============================] - 0s 414us/step - loss: 0.0627 - val_loss: 0.0679
Epoch 41/200
105/105 [==============================] - 0s 332us/step - loss: 0.0616 - val_loss: 0.0671
Epoch 42/200
105/105 [==============================] - 0s 336us/step - loss: 0.0611 - val_loss: 0.0665
Epoch 43/200
105/105 [==============================] - 0s 350us/step - loss: 0.0601 - val_loss: 0.0654
Epoch 44/200
105/105 [==============================] - 0s 397us/step - loss: 0.0596 - val_loss: 0.0646
Epoch 45/200
105/105 [==============================] - 0s 404us/step - loss: 0.0586 - val_loss: 0.0638
Epoch 46/200
105/105 [==============================] - 0s 375us/step - loss: 0.0582 - val_loss: 0.0635
Epoch 47/200
105/105 [==============================] - 0s 348us/step - loss: 0.0577 - val_loss: 0.0621
Epoch 48/200
105/105 [==============================] - 0s 333us/step - loss: 0.0568 - val_loss: 0.0614
Epoch 49/200
105/105 [==============================] - 0s 347us/step - loss: 0.0556 - val_loss: 0.0610
Epoch 50/200
105/105 [==============================] - 0s 337us/step - loss: 0.0547 - val_loss: 0.0603
Epoch 51/200
105/105 [==============================] - 0s 373us/step - loss: 0.0545 - val_loss: 0.0592
Epoch 52/200
105/105 [==============================] - 0s 350us/step - loss: 0.0536 - val_loss: 0.0581
Epoch 53/200
105/105 [==============================] - 0s 338us/step - loss: 0.0529 - val_loss: 0.0574
Epoch 54/200
105/105 [==============================] - 0s 345us/step - loss: 0.0520 - val_loss: 0.0574
Epoch 55/200
105/105 [==============================] - 0s 334us/step - loss: 0.0518 - val_loss: 0.0560
Epoch 56/200
105/105 [==============================] - 0s 330us/step - loss: 0.0508 - val_loss: 0.0551
Epoch 57/200
105/105 [==============================] - 0s 340us/step - loss: 0.0499 - val_loss: 0.0547
Epoch 58/200
105/105 [==============================] - 0s 351us/step - loss: 0.0495 - val_loss: 0.0538
Epoch 59/200
105/105 [==============================] - 0s 341us/step - loss: 0.0487 - val_loss: 0.0529
Epoch 60/200
105/105 [==============================] - 0s 335us/step - loss: 0.0475 - val_loss: 0.0527
Epoch 61/200
105/105 [==============================] - 0s 346us/step - loss: 0.0475 - val_loss: 0.0518
Epoch 62/200
105/105 [==============================] - 0s 317us/step - loss: 0.0467 - val_loss: 0.0508
Epoch 63/200
105/105 [==============================] - 0s 323us/step - loss: 0.0460 - val_loss: 0.0509
Epoch 64/200
105/105 [==============================] - 0s 312us/step - loss: 0.0458 - val_loss: 0.0494
Epoch 65/200
105/105 [==============================] - 0s 316us/step - loss: 0.0447 - val_loss: 0.0487
Epoch 66/200
105/105 [==============================] - 0s 310us/step - loss: 0.0442 - val_loss: 0.0482
Epoch 67/200
105/105 [==============================] - 0s 339us/step - loss: 0.0435 - val_loss: 0.0478
Epoch 68/200
105/105 [==============================] - 0s 376us/step - loss: 0.0435 - val_loss: 0.0468
Epoch 69/200
105/105 [==============================] - 0s 329us/step - loss: 0.0424 - val_loss: 0.0462
Epoch 70/200
105/105 [==============================] - 0s 321us/step - loss: 0.0417 - val_loss: 0.0454
Epoch 71/200
105/105 [==============================] - 0s 413us/step - loss: 0.0410 - val_loss: 0.0451
Epoch 72/200
105/105 [==============================] - 0s 308us/step - loss: 0.0406 - val_loss: 0.0441
Epoch 73/200
105/105 [==============================] - 0s 343us/step - loss: 0.0400 - val_loss: 0.0435
Epoch 74/200
105/105 [==============================] - 0s 327us/step - loss: 0.0390 - val_loss: 0.0429
Epoch 75/200
105/105 [==============================] - 0s 354us/step - loss: 0.0386 - val_loss: 0.0422
Epoch 76/200
105/105 [==============================] - 0s 329us/step - loss: 0.0382 - val_loss: 0.0417
Epoch 77/200
105/105 [==============================] - 0s 325us/step - loss: 0.0375 - val_loss: 0.0410
Epoch 78/200
105/105 [==============================] - 0s 321us/step - loss: 0.0370 - val_loss: 0.0404
Epoch 79/200
105/105 [==============================] - 0s 338us/step - loss: 0.0362 - val_loss: 0.0398
Epoch 80/200
105/105 [==============================] - 0s 322us/step - loss: 0.0358 - val_loss: 0.0394
Epoch 81/200
105/105 [==============================] - 0s 327us/step - loss: 0.0354 - val_loss: 0.0386
Epoch 82/200
105/105 [==============================] - 0s 336us/step - loss: 0.0348 - val_loss: 0.0380
Epoch 83/200
105/105 [==============================] - 0s 341us/step - loss: 0.0343 - val_loss: 0.0378
Epoch 84/200
105/105 [==============================] - 0s 325us/step - loss: 0.0338 - val_loss: 0.0369
Epoch 85/200
105/105 [==============================] - 0s 344us/step - loss: 0.0334 - val_loss: 0.0364
Epoch 86/200
105/105 [==============================] - 0s 331us/step - loss: 0.0328 - val_loss: 0.0361
Epoch 87/200
105/105 [==============================] - 0s 347us/step - loss: 0.0323 - val_loss: 0.0356
Epoch 88/200
105/105 [==============================] - 0s 361us/step - loss: 0.0319 - val_loss: 0.0348
Epoch 89/200
105/105 [==============================] - 0s 335us/step - loss: 0.0315 - val_loss: 0.0342
Epoch 90/200
105/105 [==============================] - 0s 353us/step - loss: 0.0309 - val_loss: 0.0337
Epoch 91/200
105/105 [==============================] - 0s 329us/step - loss: 0.0305 - val_loss: 0.0332
Epoch 92/200
105/105 [==============================] - 0s 351us/step - loss: 0.0304 - val_loss: 0.0326
Epoch 93/200
105/105 [==============================] - 0s 313us/step - loss: 0.0294 - val_loss: 0.0322
Epoch 94/200
105/105 [==============================] - 0s 318us/step - loss: 0.0290 - val_loss: 0.0317
Epoch 95/200
105/105 [==============================] - 0s 350us/step - loss: 0.0287 - val_loss: 0.0312
Epoch 96/200
105/105 [==============================] - 0s 445us/step - loss: 0.0282 - val_loss: 0.0307
Epoch 97/200
105/105 [==============================] - 0s 337us/step - loss: 0.0277 - val_loss: 0.0303
Epoch 98/200
105/105 [==============================] - 0s 338us/step - loss: 0.0272 - val_loss: 0.0299
Epoch 99/200
105/105 [==============================] - 0s 334us/step - loss: 0.0270 - val_loss: 0.0293
Epoch 100/200
105/105 [==============================] - 0s 309us/step - loss: 0.0261 - val_loss: 0.0291
Epoch 101/200
105/105 [==============================] - 0s 336us/step - loss: 0.0260 - val_loss: 0.0287
Epoch 102/200
105/105 [==============================] - 0s 339us/step - loss: 0.0258 - val_loss: 0.0280
Epoch 103/200
105/105 [==============================] - 0s 324us/step - loss: 0.0253 - val_loss: 0.0277
Epoch 104/200
105/105 [==============================] - 0s 335us/step - loss: 0.0250 - val_loss: 0.0279
Epoch 105/200
105/105 [==============================] - 0s 387us/step - loss: 0.0247 - val_loss: 0.0268
Epoch 106/200
105/105 [==============================] - 0s 330us/step - loss: 0.0241 - val_loss: 0.0264
Epoch 107/200
105/105 [==============================] - 0s 311us/step - loss: 0.0237 - val_loss: 0.0260
Epoch 108/200
105/105 [==============================] - 0s 299us/step - loss: 0.0235 - val_loss: 0.0256
Epoch 109/200
105/105 [==============================] - 0s 349us/step - loss: 0.0230 - val_loss: 0.0252
Epoch 110/200
105/105 [==============================] - 0s 340us/step - loss: 0.0228 - val_loss: 0.0248
Epoch 111/200
105/105 [==============================] - 0s 326us/step - loss: 0.0224 - val_loss: 0.0244
Epoch 112/200
105/105 [==============================] - 0s 378us/step - loss: 0.0221 - val_loss: 0.0242
Epoch 113/200
105/105 [==============================] - 0s 316us/step - loss: 0.0218 - val_loss: 0.0242
Epoch 114/200
105/105 [==============================] - 0s 316us/step - loss: 0.0214 - val_loss: 0.0235
Epoch 115/200
105/105 [==============================] - 0s 312us/step - loss: 0.0212 - val_loss: 0.0229
Epoch 116/200
105/105 [==============================] - 0s 313us/step - loss: 0.0207 - val_loss: 0.0229
Epoch 117/200
105/105 [==============================] - 0s 304us/step - loss: 0.0204 - val_loss: 0.0222
Epoch 118/200
105/105 [==============================] - 0s 333us/step - loss: 0.0202 - val_loss: 0.0219
Epoch 119/200
105/105 [==============================] - 0s 406us/step - loss: 0.0197 - val_loss: 0.0219
Epoch 120/200
105/105 [==============================] - 0s 416us/step - loss: 0.0197 - val_loss: 0.0213
Epoch 121/200
105/105 [==============================] - 0s 374us/step - loss: 0.0192 - val_loss: 0.0209
Epoch 122/200
105/105 [==============================] - 0s 362us/step - loss: 0.0191 - val_loss: 0.0207
Epoch 123/200
105/105 [==============================] - 0s 338us/step - loss: 0.0189 - val_loss: 0.0203
Epoch 124/200
105/105 [==============================] - 0s 345us/step - loss: 0.0185 - val_loss: 0.0200
Epoch 125/200
105/105 [==============================] - 0s 352us/step - loss: 0.0183 - val_loss: 0.0198
Epoch 126/200
105/105 [==============================] - 0s 360us/step - loss: 0.0178 - val_loss: 0.0194
Epoch 127/200
105/105 [==============================] - 0s 339us/step - loss: 0.0177 - val_loss: 0.0192
Epoch 128/200
105/105 [==============================] - 0s 330us/step - loss: 0.0174 - val_loss: 0.0190
Epoch 129/200
105/105 [==============================] - 0s 333us/step - loss: 0.0171 - val_loss: 0.0186
Epoch 130/200
105/105 [==============================] - 0s 337us/step - loss: 0.0170 - val_loss: 0.0184
Epoch 131/200
105/105 [==============================] - 0s 353us/step - loss: 0.0166 - val_loss: 0.0181
Epoch 132/200
105/105 [==============================] - 0s 349us/step - loss: 0.0165 - val_loss: 0.0178
Epoch 133/200
105/105 [==============================] - 0s 360us/step - loss: 0.0161 - val_loss: 0.0176
Epoch 134/200
105/105 [==============================] - 0s 332us/step - loss: 0.0160 - val_loss: 0.0175
Epoch 135/200
105/105 [==============================] - 0s 307us/step - loss: 0.0158 - val_loss: 0.0171
Epoch 136/200
105/105 [==============================] - 0s 328us/step - loss: 0.0154 - val_loss: 0.0171
Epoch 137/200
105/105 [==============================] - 0s 325us/step - loss: 0.0152 - val_loss: 0.0166
Epoch 138/200
105/105 [==============================] - 0s 357us/step - loss: 0.0151 - val_loss: 0.0165
Epoch 139/200
105/105 [==============================] - 0s 363us/step - loss: 0.0149 - val_loss: 0.0163
Epoch 140/200
105/105 [==============================] - 0s 325us/step - loss: 0.0147 - val_loss: 0.0166
Epoch 141/200
105/105 [==============================] - 0s 336us/step - loss: 0.0146 - val_loss: 0.0168
Epoch 142/200
105/105 [==============================] - 0s 328us/step - loss: 0.0147 - val_loss: 0.0160
Epoch 143/200
105/105 [==============================] - 0s 336us/step - loss: 0.0144 - val_loss: 0.0154
Epoch 144/200
105/105 [==============================] - 0s 339us/step - loss: 0.0140 - val_loss: 0.0152
Epoch 145/200
105/105 [==============================] - 0s 326us/step - loss: 0.0138 - val_loss: 0.0151
Epoch 146/200
105/105 [==============================] - 0s 316us/step - loss: 0.0137 - val_loss: 0.0154
Epoch 147/200
105/105 [==============================] - 0s 318us/step - loss: 0.0136 - val_loss: 0.0148
Epoch 148/200
105/105 [==============================] - 0s 309us/step - loss: 0.0133 - val_loss: 0.0152
Epoch 149/200
105/105 [==============================] - 0s 305us/step - loss: 0.0132 - val_loss: 0.0145
Epoch 150/200
105/105 [==============================] - 0s 304us/step - loss: 0.0130 - val_loss: 0.0145
Epoch 151/200
105/105 [==============================] - 0s 323us/step - loss: 0.0128 - val_loss: 0.0143
Epoch 152/200
105/105 [==============================] - 0s 352us/step - loss: 0.0128 - val_loss: 0.0142
Epoch 153/200
105/105 [==============================] - 0s 307us/step - loss: 0.0125 - val_loss: 0.0136
Epoch 154/200
105/105 [==============================] - 0s 312us/step - loss: 0.0124 - val_loss: 0.0134
Epoch 155/200
105/105 [==============================] - 0s 300us/step - loss: 0.0123 - val_loss: 0.0133
Epoch 156/200
105/105 [==============================] - 0s 314us/step - loss: 0.0122 - val_loss: 0.0133
Epoch 157/200
105/105 [==============================] - 0s 315us/step - loss: 0.0120 - val_loss: 0.0129
Epoch 158/200
105/105 [==============================] - 0s 303us/step - loss: 0.0119 - val_loss: 0.0132
Epoch 159/200
105/105 [==============================] - 0s 313us/step - loss: 0.0118 - val_loss: 0.0127
Epoch 160/200
105/105 [==============================] - 0s 317us/step - loss: 0.0117 - val_loss: 0.0126
Epoch 161/200
105/105 [==============================] - 0s 321us/step - loss: 0.0116 - val_loss: 0.0131
Epoch 162/200
105/105 [==============================] - 0s 302us/step - loss: 0.0115 - val_loss: 0.0127
Epoch 163/200
105/105 [==============================] - 0s 307us/step - loss: 0.0113 - val_loss: 0.0122
Epoch 164/200
105/105 [==============================] - 0s 319us/step - loss: 0.0112 - val_loss: 0.0120
Epoch 165/200
105/105 [==============================] - 0s 311us/step - loss: 0.0111 - val_loss: 0.0120
Epoch 166/200
105/105 [==============================] - 0s 304us/step - loss: 0.0110 - val_loss: 0.0118
Epoch 167/200
105/105 [==============================] - 0s 329us/step - loss: 0.0108 - val_loss: 0.0116
Epoch 168/200
105/105 [==============================] - 0s 305us/step - loss: 0.0108 - val_loss: 0.0116
Epoch 169/200
105/105 [==============================] - 0s 310us/step - loss: 0.0107 - val_loss: 0.0118
Epoch 170/200
105/105 [==============================] - 0s 324us/step - loss: 0.0107 - val_loss: 0.0114
Epoch 171/200
105/105 [==============================] - 0s 308us/step - loss: 0.0106 - val_loss: 0.0112
Epoch 172/200
105/105 [==============================] - 0s 308us/step - loss: 0.0105 - val_loss: 0.0111
Epoch 173/200
105/105 [==============================] - 0s 314us/step - loss: 0.0104 - val_loss: 0.0111
Epoch 174/200
105/105 [==============================] - 0s 309us/step - loss: 0.0103 - val_loss: 0.0111
Epoch 175/200
105/105 [==============================] - 0s 314us/step - loss: 0.0102 - val_loss: 0.0110
Epoch 176/200
105/105 [==============================] - 0s 309us/step - loss: 0.0102 - val_loss: 0.0109
Epoch 177/200
105/105 [==============================] - 0s 313us/step - loss: 0.0101 - val_loss: 0.0108
Epoch 178/200
105/105 [==============================] - 0s 314us/step - loss: 0.0100 - val_loss: 0.0112
Epoch 179/200
105/105 [==============================] - 0s 302us/step - loss: 0.0100 - val_loss: 0.0107
Epoch 180/200
105/105 [==============================] - 0s 316us/step - loss: 0.0098 - val_loss: 0.0104
Epoch 181/200
105/105 [==============================] - 0s 315us/step - loss: 0.0098 - val_loss: 0.0107
Epoch 182/200
105/105 [==============================] - 0s 310us/step - loss: 0.0097 - val_loss: 0.0103
Epoch 183/200
105/105 [==============================] - 0s 317us/step - loss: 0.0096 - val_loss: 0.0104
Epoch 184/200
105/105 [==============================] - 0s 331us/step - loss: 0.0095 - val_loss: 0.0101
Epoch 185/200
105/105 [==============================] - 0s 299us/step - loss: 0.0094 - val_loss: 0.0104
Epoch 186/200
105/105 [==============================] - 0s 301us/step - loss: 0.0094 - val_loss: 0.0100
Epoch 187/200
105/105 [==============================] - 0s 328us/step - loss: 0.0094 - val_loss: 0.0102
Epoch 188/200
105/105 [==============================] - 0s 306us/step - loss: 0.0093 - val_loss: 0.0100
Epoch 189/200
105/105 [==============================] - 0s 302us/step - loss: 0.0093 - val_loss: 0.0099
Epoch 190/200
105/105 [==============================] - 0s 322us/step - loss: 0.0092 - val_loss: 0.0097
Epoch 191/200
105/105 [==============================] - 0s 315us/step - loss: 0.0092 - val_loss: 0.0097
Epoch 192/200
105/105 [==============================] - 0s 303us/step - loss: 0.0092 - val_loss: 0.0097
Epoch 193/200
105/105 [==============================] - 0s 307us/step - loss: 0.0091 - val_loss: 0.0098
Epoch 194/200
105/105 [==============================] - 0s 352us/step - loss: 0.0090 - val_loss: 0.0096
Epoch 195/200
105/105 [==============================] - 0s 313us/step - loss: 0.0089 - val_loss: 0.0100
Epoch 196/200
105/105 [==============================] - 0s 359us/step - loss: 0.0090 - val_loss: 0.0103
Epoch 197/200
105/105 [==============================] - 0s 341us/step - loss: 0.0089 - val_loss: 0.0096
Epoch 198/200
105/105 [==============================] - 0s 322us/step - loss: 0.0088 - val_loss: 0.0094
Epoch 199/200
105/105 [==============================] - 0s 318us/step - loss: 0.0088 - val_loss: 0.0093
Epoch 200/200
105/105 [==============================] - 0s 295us/step - loss: 0.0088 - val_loss: 0.0092

As you can see above, we train our network down to a validation MSE < 0.01. Notice that both the training loss ("loss") and validation loss ("val_loss") are reported. It's normal for the training loss to be lower than the validation loss, since the network's objective is to predict the training data well. But if the training loss is much lower than our validation loss, it means we are overfitting and may not expect to receive very good results.

We can evaluate the training set one last time at the end using evaluate.


In [147]:
score = model.evaluate(x_test, y_test)
print('Test loss:', score)


45/45 [==============================] - 0s 97us/step
Test loss: 0.00922720053543647

To get the raw predictions:


In [150]:
y_pred = model.predict(x_test)

for yp, ya in list(zip(y_pred, y_test))[0:10]:
    print("predicted %0.2f, actual %0.2f" % (yp, ya))


predicted 0.72, actual 0.96
predicted 0.53, actual 0.52
predicted 0.87, actual 0.84
predicted 0.72, actual 0.72
predicted 0.16, actual 0.08
predicted 0.13, actual 0.08
predicted 0.13, actual 0.08
predicted 0.15, actual 0.08
predicted 0.62, actual 0.60
predicted 0.54, actual 0.52

We can manually calculate MSE as a sanity check:


In [152]:
def MSE(y_pred, y_test):
    return (1.0/len(y_test)) * np.sum([((y1[0]-y2)**2) for y1, y2 in list(zip(y_pred, y_test))])

print("MSE is %0.4f" % MSE(y_pred, y_test))


MSE is 0.0092

We can also predict the value of a single unknown example or a set of them in th following way:


In [154]:
x_sample = x_test[0].reshape(1, 3)   # shape must be (num_samples, 3), even if num_samples = 1
y_prob = model.predict(x_sample)

print("predicted %0.3f, actual %0.3f" % (y_prob[0][0], y_test[0]))


predicted 0.723, actual 0.960

We've now finished introducing Keras for regression. Note it is a far more powerful way of training neural networks than our own. Keras's strengths will become even more apparent when we introduce classification in the next lesson, as well as introduce convolutional networks and various other optimization tricks it enables for us.