Deep Learning in Python

Tobias Brandt

About Me

Tutorial Outline

  1. Deep Learning in Python is simple and powerful!
  2. An introduction to (Artificial) Neural Networks
  3. An introduction to Deep Learning

Requirements

  • Python 3.4 (or legacy Python 2.7)
  • Keras >= 1.0.0
  • Theano or Tensorflow
  • git clone https://github.com/snth/ctdeep.git

Deep Learning in Python is simple


In [ ]:
import keras

In [ ]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D

batch_size = 128
nb_classes = 10
nb_epoch = 15

# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)

In [ ]:
# %load ..\keras\examples\mnist_cnn.py
'''Trains a simple convnet on the MNIST dataset.

Gets to 99.25% test accuracy after 12 epochs
(there is still a lot of margin for parameter tuning).
16 seconds per epoch on a GRID K520 GPU.
'''

from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.utils import np_utils
from keras import backend as K

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
(images_train, labels_train), (images_test, labels_test) = (X_train, y_train), (X_test, y_test)

if K.image_dim_ordering() == 'th':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

In [ ]:
model = Sequential()

model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
                        border_mode='valid',
                        input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

In [ ]:
print("Compiling the model ...")
model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])
#model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
#          verbose=1, validation_data=(X_test, Y_test))

In [ ]:
import os
for epoch in range(1, nb_epoch+1):
    weights_path = "models/mnist_cnn_{}_weights.h5".format(epoch)
    if os.path.exists(weights_path):
        print("Loading precomputed weights for epoch {} ...".format(epoch))
        model.load_weights(weights_path)
        print('Evaluating the model on the test set ...')
        score = model.evaluate(X_test, Y_test, verbose=1)
        print('Test score:', score[0])
        print('Test accuracy:', score[1])
    else:
        print("Fitting the model for epoch {} ...".format(epoch))
        model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=1,
                  validation_data=(X_test, Y_test), verbose=1)
        model.save_weights(weights_path)

In [ ]:
print('Evaluating the model on the test set ...')
score = model.evaluate(X_test, Y_test, verbose=1)
print('Test score:', score[0])
print('Test accuracy:', score[1])

So building and training Neural Networks in Python in simple!

But it is also powerful!

Neural Style Transfer: github.com/titu1994/Neural-Style-Transfer

+

=

</tr></table> </font>

Neural Networks

A neuron looks something like this

Symbolically we can represent the key parts we want to model as

In order to build an artifical "brain" we need to connect together many neurons in a "neural network"

We can model the response of each neuron with various activation functions

Training a Neural Network

Mathematically the activation of each neuron can be represented by

where $W$ and $b$ are the weights and bias respectively.

Loss Function

Neural Networks in Python

Keras

  • High level library for specifying and training neural networks
  • Can use Theano or TensorFlow as backend

Keras makes Neural Networks awesome!

Theano

  • Python library that provides efficient (low-level) tools for working with Neural Networks
  • In particular:
    • Automatic Differentiation (AD)
    • Compiled computation graphs
    • GPU accelerated computation

Tensorflow

  • Deep Learning framework by Google

The MNIST Dataset

  • 70,000 handwritten digits
    • 60,000 for training
    • 10,000 for testing
  • As 28x28 pixel images

In [ ]:
from __future__ import absolute_import, print_function, division
from ipywidgets import interact, interactive, widgets
import numpy as np
np.random.seed(1337)  # for reproducibility

Let's load some data


In [ ]:
from keras.datasets import mnist
#(images_train, labels_train), (images_test, labels_test) = mnist.load_data()
print("Data shapes:")
print('images',images_train.shape)
print('labels', labels_train.shape)

and then visualise it


In [ ]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

In [ ]:
def plot_mnist_digit(image, figsize=None):
    """ Plot a single MNIST image."""
    fig = plt.figure()
    ax = fig.add_subplot(1, 1, 1)
    if figsize:
        ax.set_figsize(*figsize)
    ax.matshow(image, cmap = matplotlib.cm.binary)
    plt.xticks(np.array([]))
    plt.yticks(np.array([]))
    plt.show()

In [ ]:
def plot_1_by_2_images(image, reconstruction, figsize=None):
    fig = plt.figure(figsize=figsize)
    ax = fig.add_subplot(1, 2, 1)
    ax.matshow(image, cmap = matplotlib.cm.binary)
    plt.xticks(np.array([]))
    plt.yticks(np.array([]))
    ax = fig.add_subplot(1, 2, 2)
    ax.matshow(reconstruction, cmap = matplotlib.cm.binary)
    plt.xticks(np.array([]))
    plt.yticks(np.array([]))
    plt.show()

In [ ]:
def plot_10_by_10_images(images, figsize=None):
    """ Plot 100 MNIST images in a 10 by 10 table. Note that we crop
    the images so that they appear reasonably close together.  The
    image is post-processed to give the appearance of being continued."""
    fig = plt.figure(figsize=figsize)
    #images = [image[3:25, 3:25] for image in images]
    #image = np.concatenate(images, axis=1)
    for x in range(10):
        for y in range(10):
            ax = fig.add_subplot(10, 10, 10*y+x+1)
            ax.matshow(images[10*y+x], cmap = matplotlib.cm.binary)
            plt.xticks(np.array([]))
            plt.yticks(np.array([]))
    plt.show()

In [ ]:
def plot_10_by_20_images(left, right, figsize=None):
    """ Plot 100 MNIST images next to their reconstructions"""
    fig = plt.figure(figsize=figsize)
    for x in range(10):
        for y in range(10):
            ax = fig.add_subplot(10, 21, 21*y+x+1)
            ax.matshow(left[10*y+x], cmap = matplotlib.cm.binary)
            plt.xticks(np.array([]))
            plt.yticks(np.array([]))
            ax = fig.add_subplot(10, 21, 21*y+11+x+1)
            ax.matshow(right[10*y+x], cmap = matplotlib.cm.binary)
            plt.xticks(np.array([]))
            plt.yticks(np.array([]))
    plt.show()

In [ ]:
plot_10_by_10_images(images_train, figsize=(8,8))

In [ ]:
def draw_image(i):
    plot_mnist_digit(images_train[i])
    print('label:', labels_train[i])

In [ ]:
interact(draw_image, i=(0, len(images_train)-1))
None

Data Preprocessing

Transform "images" to "features" ...

Most machine learning algorithms expect a flat array of numbers


In [ ]:
def to_features(X):
    return X.reshape(-1, 784).astype("float32") / 255.0

def to_images(X):
    return (X*255.0).astype('uint8').reshape(-1, 28, 28)

print('data shape:', images_train.shape, images_train.dtype)
print('features shape', to_features(images_train).shape, to_features(images_train).dtype)

Split the data into a "training" and "test" set ...


In [ ]:
#(images_train, labels_train), (images_test, labels_test) = mnist.load_data()
X_train = to_features(images_train)
X_test = to_features(images_test)
print(X_train.shape, 'training samples')
print(X_test.shape, 'test samples')

Transform the labels to a "one-hot" encoding ...


In [ ]:
# The labels need to be transformed into class indicators
from keras.utils import np_utils
y_train = np_utils.to_categorical(labels_train, nb_classes=10)
y_test = np_utils.to_categorical(labels_test, nb_classes=10)
print('labels_train:', labels_train.shape, labels_train.dtype)
print('y_train:', y_test.shape, y_train.dtype)

For example, let's inspect the first 2 labels:


In [ ]:
print('labels_train[:2]:\n', labels_train[:2][:, np.newaxis])
print('y_train[:2]\n', y_train[:2])

Simple Multi-Layer Perceptron (MLP)

The simplest kind of Artificial Neural Network is as Multi-Layer Perceptron (MLP) with a single hidden layer.


In [ ]:
# Neural Network Architecture Parameters
nb_input = 784
nb_hidden = 512
nb_output = 10
# Training Parameters
nb_epoch = 1
batch_size = 128

First we define the "architecture" of the network


In [ ]:
from keras.models import Sequential
from keras.layers.core import Dense, Activation

mlp = Sequential()

mlp.add(Dense(output_dim=nb_hidden, input_dim=nb_input, init='uniform'))
mlp.add(Activation('sigmoid'))

mlp.add(Dense(output_dim=nb_output, input_dim=nb_hidden, init='uniform'))
mlp.add(Activation('softmax'))

then we compile it. This takes the symbolic computational graph of the model and compiles it an efficient implementation which can then be used to train and evaluate the model.

Note that we have to specify what loss/objective function we want to use as well which optimisation algorithm to use. SGD stands for Stochastic Gradient Descent.


In [ ]:
mlp.compile(loss='categorical_crossentropy', optimizer='SGD',
            metrics=["accuracy"])

Next we train the model on our training data. Watch the loss, which is the objective function which we are minimising, and the estimated accuracy of the model.


In [ ]:
mlp.fit(X_train, y_train, 
        batch_size=batch_size, nb_epoch=nb_epoch,
        verbose=1)

Once the model is trained, we can evaluate its performance on the test data.


In [ ]:
mlp.evaluate(X_test, y_test)

In [ ]:
#plot_10_by_10_images(images_test, figsize=(8,8))

In [ ]:
def draw_mlp_prediction(j):
    plot_mnist_digit(to_images(X_test)[j])
    prediction = mlp.predict_classes(X_test[j:j+1], verbose=False)[0]
    print('predict:', prediction, '\tactual:', labels_test[j])

In [ ]:
interact(draw_mlp_prediction, j=(0, len(X_test)-1))
None

Deep Learning

Why do we want Deep Neural Networks?

Universal Approximation Theorem

The theorem thus states that simple neural networks can represent 
a wide variety of interesting functions when given appropriate parameters;
however, it does not touch upon the algorithmic learnability of those parameters.

On the number of Go positions

While discussing the complexity of the game of Go, Demis Hassabis said:

There are more possible Go positions than there are atoms in the universe.

A Go board has 19 × 19 points, each of which can be empty or occupied by black or white, so there are 3(19 × 19) 10172 possible board positions, but "only" about 10170 of those positions are legal.

The crucial idea is, that as a number of physical things, 1080 is a really big number. But as a number of combinations of things, 1080 is a rather small number. It doesn't take a universe of stuff to get up to 1080 combinations; we can get there with, for example, a passphrase field that is 40 characters long:

a correct horse battery staple troubador

On the number of digital pictures

There is an art project to display every possible picture. Surely that would take a long time, because there must be many possible pictures. But how many?

We will assume the color model known as True Color, in which each pixel can be one of 2^24 ≅ 17 million distinct colors. The digital camera shown below left has 12 million pixels, and we'll also consider much smaller pictures: the array below middle, with 300 pixels, and the array below right with just 12 pixels; shown are some of the possible pictures:

Quiz: Which of these produces a number of pictures similar to the number of atoms in the universe?

Answer: An array of n pixels produces (17 million)^n different pictures. (17 million)^12 ≅ 10^86, so the tiny 12-pixel array produces a million times more pictures than the number of atoms in the universe!

How about the 300 pixel array? It can produce 10^2167 pictures. You may think the number of atoms in the universe is big, but that's just peanuts to the number of pictures in a 300-pixel array. And 12M pixels? 10^86696638 pictures. Fuggedaboutit!

So the number of possible pictures is really, really, really big. And the number of atoms in the universe is looking relatively small, at least as a number of combinations.

==> The Curse of Dimensionality!

Feature Hierarchies

A Deeper MLP

Next we build a two-layer MLP with the same number of hidden nodes, half in each layer.


In [ ]:
from keras.models import Sequential
nb_layers = 2
mlp2 = Sequential()
# add hidden layers
for i in range(nb_layers):
    mlp2.add(Dense(output_dim=nb_hidden//nb_layers, input_dim=nb_input if i==0 else nb_hidden//nb_layers, init='uniform'))
    mlp2.add(Activation('sigmoid'))
# add output layer
mlp2.add(Dense(output_dim=nb_output, input_dim=nb_hidden//nb_layers, init='uniform'))
mlp2.add(Activation('softmax'))

In [ ]:
mlp2.compile(loss='categorical_crossentropy', optimizer='SGD',
             metrics=["accuracy"])

In [ ]:
mlp2.fit(X_train, y_train, batch_size=batch_size, nb_epoch=nb_epoch,
         verbose=1)

Did you notice anything about the accuracy? Let's train it some more.


In [ ]:
mlp2.fit(X_train, y_train, batch_size=batch_size, nb_epoch=nb_epoch,
         verbose=1)

In [ ]:
mlp2.evaluate(X_test, y_test)

Autoencoders


In [ ]:
from IPython.display import HTML
HTML('<iframe src="pdf/Hinton2006-science.pdf" width=800 height=400></iframe>')

In [ ]:
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout

print('nb_input =', nb_input)
print('nb_hidden =', nb_hidden)
ae = Sequential()
# encoder
ae.add(Dense(nb_hidden, input_dim=nb_input, init='uniform'))
ae.add(Activation('sigmoid'))
# decoder
ae.add(Dense(nb_input, input_dim=nb_hidden, init='uniform'))
ae.add(Activation('sigmoid'))

In [ ]:
ae.compile(loss='mse', optimizer='SGD')

In [ ]:
nb_epoch = 1
ae.fit(X_train, X_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1)

In [ ]:
plot_10_by_20_images(images_test, to_images(ae.predict(X_test)),
                     figsize=(10,5))

In [ ]:
from keras.optimizers import SGD
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

In [ ]:
ae.compile(loss='mse', optimizer=sgd)
nb_epoch = 1
ae.fit(X_train, X_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1)

In [ ]:
plot_10_by_20_images(images_test, to_images(ae.predict(X_test)),
                     figsize=(10,5))

In [ ]:
def draw_ae_prediction(j):
    X_plot = X_test[j:j+1]
    prediction = ae.predict(X_plot, verbose=False)
    plot_1_by_2_images(to_images(X_plot)[0], to_images(prediction)[0])

In [ ]:
interact(draw_ae_prediction, j=(0, len(X_test)-1))
None

A better Autoencoder


In [ ]:
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout

def make_autoencoder(nb_input=nb_input, nb_hidden=nb_hidden,
                     activation='sigmoid', init='uniform'):
    ae = Sequential()
    # encoder
    ae.add(Dense(nb_hidden, input_dim=nb_input, init=init))
    ae.add(Activation(activation))
    # decoder
    ae.add(Dense(nb_input, input_dim=nb_hidden, init=init))
    ae.add(Activation(activation))
    return ae

In [ ]:
nb_epoch = 1
ae2 = make_autoencoder(activation='sigmoid', init='glorot_uniform')
ae2.compile(loss='mse', optimizer='adam')
ae2.fit(X_train, X_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1)

In [ ]:
plot_10_by_20_images(images_test, to_images(ae2.predict(X_test)), figsize=(10,5))

In [ ]:
def draw_ae2_prediction(j):
    X_plot = X_test[j:j+1]
    prediction = ae2.predict(X_plot, verbose=False)
    plot_1_by_2_images(to_images(X_plot)[0], to_images(prediction)[0])

In [ ]:
interact(draw_ae2_prediction, j=(0, len(X_test)-1))
None

Stacked Autoencoder


In [ ]:
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout

class StackedAutoencoder(object):
    
    def __init__(self, layers, mode='autoencoder',
                 activation='sigmoid', init='uniform', final_activation='softmax',
                 dropout=0.2, optimizer='SGD', metrics=None):
        self.layers = layers
        self.mode = mode
        self.activation = activation
        self.final_activation = final_activation
        self.init = init
        self.dropout = dropout
        self.optimizer = optimizer
        self.metrics = metrics
        self._model = None
        
        self.build()
        self.compile()
    
    def _add_layer(self, model, i, is_encoder):
        if is_encoder:
            input_dim, output_dim = self.layers[i], self.layers[i+1]
            activation = self.final_activation if i==len(self.layers)-2 else self.activation
        else:
            input_dim, output_dim = self.layers[i+1], self.layers[i]
            activation = self.activation
        model.add(Dense(output_dim=output_dim,
                        input_dim=input_dim,
                        init=self.init))
        model.add(Activation(activation))
        
    def build(self):
        self.encoder = Sequential()
        self.decoder = Sequential()
        self.autoencoder = Sequential()
        for i in range(len(self.layers)-1):
            self._add_layer(self.encoder, i, True)
            self._add_layer(self.autoencoder, i, True)
            #if i<len(self.layers)-2:
            #    self.autoencoder.add(Dropout(self.dropout))

        # Note that the decoder layers are in reverse order
        for i in reversed(range(len(self.layers)-1)):
            self._add_layer(self.decoder, i, False)
            self._add_layer(self.autoencoder, i, False)
            
    def compile(self):
        print("Compiling the encoder ...")
        self.encoder.compile(loss='categorical_crossentropy', optimizer=self.optimizer, metrics=self.metrics)
        print("Compiling the decoder ...")
        self.decoder.compile(loss='mse', optimizer=self.optimizer, metrics=self.metrics)
        print("Compiling the autoencoder ...")
        return self.autoencoder.compile(loss='mse', optimizer=self.optimizer, metrics=self.metrics)
    
    def fit(self, X_train, Y_train, batch_size, nb_epoch, verbose=1):
        result = self.autoencoder.fit(X_train, Y_train,
                                      batch_size=batch_size, nb_epoch=nb_epoch,
                                      verbose=verbose)
        # copy the weights to the encoder
        for i, l in enumerate(self.encoder.layers):
            l.set_weights(self.autoencoder.layers[i].get_weights())
        for i in range(len(self.decoder.layers)):
            self.decoder.layers[-1-i].set_weights(self.autoencoder.layers[-1-i].get_weights())
        return result
    
    def pretrain(self, X_train, batch_size, nb_epoch, verbose=1):
        for i in range(len(self.layers)-1):
            # Greedily train each layer
            print("Now pretraining layer {} [{}-->{}]".format(i+1, self.layers[i], self.layers[i+1]))
            ae = Sequential()
            self._add_layer(ae, i, True)
            #ae.add(Dropout(self.dropout))
            self._add_layer(ae, i, False)
            ae.compile(loss='mse', optimizer=self.optimizer, metrics=self.metrics)
            ae.fit(X_train, X_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=verbose)
            # Then lift the training data up one layer
            print("\nTransforming data from", X_train.shape, "to", (X_train.shape[0], self.layers[i+1]))
            enc = Sequential()
            self._add_layer(enc, i, True)
            enc.compile(loss='mse', optimizer=self.optimizer, metrics=self.metrics)
            enc.layers[0].set_weights(ae.layers[0].get_weights())
            enc.layers[1].set_weights(ae.layers[1].get_weights())
            X_train = enc.predict(X_train, verbose=verbose)
            print("\nShape check:", X_train.shape, "\n")
            # Then copy the learned weights
            self.encoder.layers[2*i].set_weights(ae.layers[0].get_weights())
            self.encoder.layers[2*i+1].set_weights(ae.layers[1].get_weights())
            self.autoencoder.layers[2*i].set_weights(ae.layers[0].get_weights())
            self.autoencoder.layers[2*i+1].set_weights(ae.layers[1].get_weights())
            self.decoder.layers[-1-(2*i)].set_weights(ae.layers[-1].get_weights())
            self.decoder.layers[-1-(2*i+1)].set_weights(ae.layers[-2].get_weights())
            self.autoencoder.layers[-1-(2*i)].set_weights(ae.layers[-1].get_weights())
            self.autoencoder.layers[-1-(2*i+1)].set_weights(ae.layers[-2].get_weights())
            
    
    def evaluate(self, X_test, Y_test):
        return self.autoencoder.evaluate(X_test, Y_test)
    
    def predict(self, X, verbose=False):
        return self.autoencoder.predict(X, verbose=verbose)

    def _get_paths(self, name):
        model_path = "models/{}_model.yaml".format(name)
        weights_path = "models/{}_weights.hdf5".format(name)
        return model_path, weights_path

    def save(self, name='autoencoder'):
        model_path, weights_path = self._get_paths(name)
        open(model_path, 'w').write(self.autoencoder.to_yaml())
        self.autoencoder.save_weights(weights_path, overwrite=True)
    
    def load(self, name='autoencoder'):
        model_path, weights_path = self._get_paths(name)
        self.autoencoder = keras.models.model_from_yaml(open(model_path))
        self.autoencoder.load_weights(weights_path)

In [ ]:
nb_epoch = 3
sae = StackedAutoencoder(layers=[nb_input, 500, 150, 50, 10],
                         activation='sigmoid',
                         final_activation='softmax',
                         init='uniform',
                         dropout=0.25,
                         optimizer='SGD')   # replace with 'adam', 'relu', 'glorot_uniform'

In [ ]:
sae.fit(X_train, X_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1)

In [ ]:
plot_10_by_20_images(images_test, to_images(sae.predict(X_test)), figsize=(10,5))

In [ ]:
def draw_sae_prediction(j):
    X_plot = X_test[j:j+1]
    prediction = sae.predict(X_plot, verbose=False)
    plot_1_by_2_images(to_images(X_plot)[0], to_images(prediction)[0])
    print(sae.encoder.predict(X_plot, verbose=False)[0])

In [ ]:
interact(draw_sae_prediction, j=(0, len(X_test)-1))
None

In [ ]:
sae.pretrain(X_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1)

Visualising the Filters


In [ ]:
def visualise_filter(model, layer_index, filter_index):
    from keras import backend as K

    # build a loss function that maximizes the activation
    # of the nth filter on the layer considered
    layer_output = model.layers[layer_index].get_output()
    loss = K.mean(layer_output[:, filter_index])

    # compute the gradient of the input picture wrt this loss
    input_img = model.layers[0].input
    grads = K.gradients(loss, input_img)[0]

    # normalization trick: we normalize the gradient
    grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)

    # this function returns the loss and grads given the input picture
    iterate = K.function([input_img], [loss, grads])

    # we start from a gray image with some noise
    input_img_data = np.random.random((1,nb_input,))
    # run gradient ascent for 20 steps
    step = 1
    for i in range(100):
        loss_value, grads_value = iterate([input_img_data])
        input_img_data += grads_value * step

        #print("Current loss value:", loss_value)
        if loss_value <= 0.:
            # some filters get stuck to 0, we can skip them
            break
    print("Current loss value:", loss_value)

    # decode the resulting input image
    if loss_value>0:
        #return input_img_data[0]
        return input_img_data
    else:
        raise ValueError(loss_value)

In [ ]:
def draw_filter(i):
    flt = visualise_filter(mlp, 3, 4)
    #print(flt)
    plot_mnist_digit(to_images(flt)[0])

In [ ]:
interact(draw_filter, i=[0, 9])