```
In [ ]:
```import matplotlib.pyplot as plt
%matplotlib inline
from utils import plot_samples, plot_curves
import time

```
In [ ]:
```import numpy as np
# force random seed for results to be reproducible
SEED = 4242
np.random.seed(SEED)

Let's begin by loading the MNIST dataset, which we will use during the whole session.

```
In [ ]:
```from keras.datasets import mnist
from keras.utils import np_utils
# Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# Display some of the samples
plot_samples(X_train)
X_train.shape

```
In [ ]:
```from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential()
# in the first layer we need to specify the input shape
model.add(Dense(10, input_shape=(784,)))
model.add(Activation('softmax'))
model.summary()

**Exercise**: `model.summary()`

gave us the total number of trainable parameters of our model. How is this number obtained?

We flatten and normalize images to match the input that the network expects:

```
In [ ]:
```X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

Categories need to be converted to one-hot vectors for training:

```
In [ ]:
```nb_classes = 10
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
y_train, Y_train

We are now ready to train. Let's define the optimizer:

```
In [ ]:
```from keras.optimizers import SGD
lr = 0.01
# For now we will not decrease the learning rate
decay = 0
optim = SGD(lr=lr, decay=decay, momentum=0.9, nesterov=True)

```
In [ ]:
```model.compile(loss='categorical_crossentropy',
optimizer=optim,
metrics=['accuracy'])

`model.fit()`

will do the training loop for us. We just need to pass the training data `X_train`

and labels `Y_train`

as input, specify the `batch_size`

and the number of epochs `nb_epoch`

we want to do. We also pass the test set `(X_test,Y_Test)`

as validation data, which will allow us to see how the model performs on the test data as training progresses. Let's run it:

```
In [ ]:
```batch_size = 32
nb_epoch = 20
verbose = 2
t = time.time()
history = model.fit(X_train, Y_train,
batch_size=batch_size, nb_epoch=nb_epoch,
verbose=verbose,validation_data=(X_test, Y_test))
print (time.time() - t, "seconds.")

`history`

object returned by `model.fit()`

. The function `plot_curves`

, which is defined in `utils.py`

will do this for us.

```
In [ ]:
```plot_curves(history,nb_epoch)

The curve trend indicates that the model may be able to improve if we train it for longer, but for now let's leave it here.

Let's now evaluate our model. `model.evaluate()`

will take all the test samples, forward them through the network and return the average loss, and any additional metrics we specified (in our case, the accuracy).

```
In [ ]:
```score = model.evaluate(X_test, Y_test, verbose=0)
print ("Loss: %f"%(score[0]))
print ("Accuracy: %f"%(score[1]))

Let's try to train a model with a hidden layer between the input and the classifier.

**Exercise**: Modify the previous architecture to include this layer with 128 neurons and train it. Take into account that the `input_shape`

must be passed to the first layer of the network.

```
In [ ]:
```import numpy as np
np.random.seed(SEED)
# MODEL DEFINITION
model = Sequential()
# ...
model.summary()

**Exercise**: Compute the number of parameters and check if they match the ones given by `model.summary()`

**Exercise:** Write the code to train the model. Check the code we used for the previous model. Should be too similar...

```
In [ ]:
``````
# COMPILE & TRAIN
```

**Exercise**: Add a dropout layer to the model and see their effect in the training curves & accuracy. See the documentation for the Dropout layer in keras.

```
In [ ]:
```import numpy as np
np.random.seed(SEED)
from keras.layers import Dropout
dratio = 0.2
H_DIM = 128
model = Sequential()
# ...
model.summary()

```
In [ ]:
```model.compile(loss='categorical_crossentropy',
optimizer=optim,
metrics=['accuracy'])
t = time.time()
history = model.fit(X_train, Y_train,
batch_size=batch_size, nb_epoch=nb_epoch,
verbose=verbose,validation_data=(X_test, Y_test))
print (time.time() - t, "seconds.")
score = model.evaluate(X_test, Y_test, verbose=0)
print ("-"*10)
print ("Loss: %f"%(score[0]))
print ("Accuracy: %f"%(score[1]))
plot_curves(history,nb_epoch)

Did this improve the curves? What about the accuracy?