Inference and Machine Learning : Berlemont Kevin

Statistical inference and MSE

We consider a system with a half-length decay : $$ P_{\lambda} (x) = \frac{e^{-x / \lambda}}{Z(\lambda)} \quad \text{if $ 1<x<20$}$$ $$ P_{\lambda} (x) = 0$$

For normalization, the constant $Z(\lambda)$ will be : $$ Z(\lambda) = (\lambda e^{-1/\lambda} - e^{-20/\lambda} \lambda)^{-1} $$

If we want to observe a set of events $\{ x_1,\cdots, x_n \}$, because all particles are independent we have the relation : $$ \{ x\} = \otimes_i \{ x_i \}$$ This leads easily to : $$ P_{\lambda} (x) = \Pi_i P_{\lambda} (x_i)$$


In [21]:
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')


Out[21]:
The raw code for this IPython notebook is by default hidden for easier reading. To toggle on/off the raw code, click here.

In [23]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as ss
% matplotlib inline


def trunc_exp_decay(low, high, scale, size): # inverse transfor msampling
    rnd_cdf = np.random.uniform(ss.expon.cdf(x=low, scale=scale),
                                ss.expon.cdf(x=high, scale=scale),
                                size=size)
    return ss.expon.ppf(q=rnd_cdf, scale=scale)

Lambda =5
Size = 600
plt.hist(trunc_exp_decay(1, 20, Lambda, Size))
plt.xlim(0, 20)
plt.title('Histogram of the probability distribution function')


Out[23]:
<matplotlib.text.Text at 0x7fdf41833f10>

In [26]:
def observations(Lambda,n) :
    """Return n observations of the process"""
    sample = trunc_exp_decay(1, 20, Lambda, n)
    return sample

We now want to estimate the value of $\lambda$. First of all we use the maximum likehood estimator and we will optimize the value with the module scipy.


In [27]:
import scipy
import scipy.stats as sciStat
import scipy.optimize as sciOpt

sample =observations(5,100)
def myMleEstimate(myFunc,par,data):

 def lnL_av(x,par):
  N = len(x)

  #Increase your efficiency using Scientific Python (scipy)
  lnL = 0.
  for i in range(N):
   lnL += scipy.log(myFunc(par,x[i]))
  return lnL/N

 objFunc = lambda s: -lnL_av(data,s)
 par_mle = sciOpt.minimize(objFunc,par,method='nelder-mead')
 return par_mle

In [28]:
Decay = lambda s,x: (np.exp(-x/s))/(s*(np.exp(-1/s) - np.exp(-20/s)))
objFunc = lambda x: myMleEstimate(Decay,1,x)
sample =observations(5,1000)
objFunc(sample)


Out[28]:
  status: 0
    nfev: 42
 success: True
     fun: 2.5116585595098315
       x: array([ 5.09072266])
 message: 'Optimization terminated successfully.'
     nit: 21

In [41]:
sample_size=np.arange(10,500,20)
Lambda_list=[1,2,5]
N=len(sample_size)
estimator = np.zeros((len(Lambda_list),N))
for i in range(N):
    estimator[0][i] = objFunc(observations(5,sample_size[i])).x

for i in range(N):
    estimator[1][i] = objFunc(observations(2,sample_size[i])).x


for i in range(N):
    estimator[2][i] = objFunc(observations(1,sample_size[i])).x

fig=plt.figure()
ax = fig.add_subplot(111)
ax.plot(sample_size,estimator[0],'o',label='$\lambda=1$')
ax.plot(sample_size,estimator[1],'o',label='$\lambda=2$')
ax.plot(sample_size,estimator[2],'o',label='$\lambda=5$')
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])

# Put a legend to the right of the current axis
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.title('Values of the estimator for some $\lambda$ for several sample size')
plt.show()


As we can observe, we have indeed convergence of our estimator toward the true value of $\lambda$. We can note that the performance increases with the size of the sample, and we could show that this estimator is unbiased.

We want now to observe the behavior of the MSE with respect to the true value of $\lambda$.


In [49]:
from sklearn.metrics import mean_squared_error

#sample_size=[10,50,100,200,300,500,1000]
sample_size=np.arange(1,50,1)
N=len(sample_size)
Lambda_list=[1,5,10]
y_true = [Lambda]
estimator = np.zeros((len(Lambda_list),N))
MSE_list=np.zeros((len(Lambda_list),N))


for j,Lambda in enumerate(Lambda_list):
 for i in range(N):
    estimator[j,i] = objFunc(observations(Lambda,sample_size[i])).x
    MSE_list[j,i]=mean_squared_error([Lambda], [estimator[j,i]])

 plt.plot(sample_size,MSE_list[j],'o',label='$\lambda=$'+str(Lambda))

plt.ylim((-5,150))
plt.legend()
plt.title('MSE with respect to the sample size for several values of $\lambda$')


Out[49]:
<matplotlib.text.Text at 0x7fdf39e9d490>

As we could predict we observe a decreasing of the MSE with respect to the size sample. Indeed we have a consistant estimator so the MSE should decrease to have convergence of our estimator.

We can note that for high $\lambda$ it harder to find the right estiamtor. This comes from the fact that we are dealing with a truncated distribution, and not the full one. When we reach the truncation size, the estimator has difficulty to detect the true value.


In [85]:
#sample_size=[10000]
sample_size=[5000]
N=1
Lambda_list = np.arange(1,20,1)
mean=10
estimator = np.zeros((len(Lambda_list),mean))
MSE_list=np.zeros(len(Lambda_list))


for j,Lambda in enumerate(Lambda_list):
 
    for k in range(mean):
     estimator[j,k] = objFunc(observations(Lambda,sample_size[0])).x
    MSE_list[j]=mean_squared_error([Lambda]*mean, estimator[j])

plt.plot(Lambda_list,MSE_list,'o')
plt.title('MSE with respect to $\lambda$')


Out[85]:
<matplotlib.text.Text at 0x7fdf33ce4250>

In [90]:
plt.plot(Lambda_list,MSE_list,'o')
Fish=[]
for i,Lambda in enumerate(Lambda_list):
    Fish.append((-5000*(Lambda**2*exp(38/Lambda) - 40*Lambda**2*exp(19/Lambda) + 39*Lambda**2 - 4*Lambda*exp(38/Lambda) + 84*Lambda*exp(19/Lambda) - 80*Lambda - 361*exp(19/Lambda))/(Lambda**4*(exp(38/Lambda) - 2*exp(19/Lambda) + 1)))**(-1))
    
plt.plot(Lambda_list,Fish) # revoir l'information de Fisher de la distri
plt.ylim([-1,0.1])


Out[90]:
(-1, 0.1)

In [64]:
from sympy import * 


x=Symbol('x')
Lambda = Symbol('Lambda')

Z = Lambda*(exp(-1/Lambda)-exp(-20/Lambda))
DZ = diff(ln(Z),Lambda,2)
DZ

Z2=ln(exp(-x/Lambda))
DZ2 = diff(Z2,Lambda,2)
DZ2
Rep=DZ2.subs(x,(20*Lambda*exp(-20/Lambda)-Lambda*exp(-1/Lambda))/(Z)+(-20*Lambda*exp(-20/Lambda)+Lambda*exp(-1/Lambda))/(Z/Lambda))

simplify(-Rep+DZ)


Out[64]:
(Lambda**2*exp(38/Lambda) - 40*Lambda**2*exp(19/Lambda) + 39*Lambda**2 - 4*Lambda*exp(38/Lambda) + 84*Lambda*exp(19/Lambda) - 80*Lambda - 361*exp(19/Lambda))/(Lambda**4*(exp(38/Lambda) - 2*exp(19/Lambda) + 1))

In [ ]:
A=diff(ln(Lambda*exp(x*Lambda)),Lambda,2)
B=diff(ln(exp(-x/Lambda)/(Lambda*(exp(-1/Lambda)-exp(-20/Lambda)))),Lambda,2)
#simplify(A)
Simp=simplify(-B)
Simp
Rep=Simp.subs(x,1/Lambda)
simplify(Rep)

The MNIST Dataset

We will now use the MNIST Dataset to train a machine learning algortihm with neural net.

For now we will use the sigmoid fonction as the transfer function.

We want to find the best estimator to our problem. For this, we use the mean square error as measure of performance of our model and we introduce regularization. Thus the total error function will be : $$ (Xw -y)^\top (Xw - y) + \frac12 \Gamma w^\top w$$ The optimal estimator will be given for : $$ \nabla_w (Xw -y)^\top (Xw - y) + \Gamma w^\top w = 0$$ We expand and find : $$ X^\top X w - X^\top y + \Gamma w = 0$$ And finally : $$ \hat w = (X^\top X + \Gamma I)^{-1} X^\top Y$$

As say at the end of part 2, the method with the matrix has the inconvenient of using a load of RAM when trying to invert it. In order to be able to obtain nice results we will use a neural network architecture we will perform a stochastic gradient descent on an neural network with the following architecture. The architecture explain why $M$, or $n$ in this case, can be described as a number of hidden neurons. To have different behavior we will use several values of $M$ in the neural networks. At the end, the learning is the same, but we didn't had to take the inverse of some big matrices, we will compute this method during this homework.


In [22]:
from IPython.display import Image
Image("tikz12.png")


Out[22]:

In [17]:
"""
mnist_loader
~~~~~~~~~~~~

A library to load the MNIST image data.  For details of the data
structures that are returned, see the doc strings for ``load_data``
and ``load_data_wrapper``.  In practice, ``load_data_wrapper`` is the
function usually called by our neural network code.
"""

#### Libraries
# Standard library
import cPickle
import gzip

# Third-party libraries
import numpy as np

def load_data():
    """Return the MNIST data as a tuple containing the training data,
    the validation data, and the test data.

    The ``training_data`` is returned as a tuple with two entries.
    The first entry contains the actual training images.  This is a
    numpy ndarray with 50,000 entries.  Each entry is, in turn, a
    numpy ndarray with 784 values, representing the 28 * 28 = 784
    pixels in a single MNIST image.

    The second entry in the ``training_data`` tuple is a numpy ndarray
    containing 50,000 entries.  Those entries are just the digit
    values (0...9) for the corresponding images contained in the first
    entry of the tuple.

    The ``validation_data`` and ``test_data`` are similar, except
    each contains only 10,000 images.

    This is a nice data format, but for use in neural networks it's
    helpful to modify the format of the ``training_data`` a little.
    That's done in the wrapper function ``load_data_wrapper()``, see
    below.
    """
    f = gzip.open('./data/mnist.pkl.gz', 'rb')
    training_data, validation_data, test_data = cPickle.load(f)
    f.close()
    return (training_data, validation_data, test_data)

def load_data_wrapper():
    """Return a tuple containing ``(training_data, validation_data,
    test_data)``. Based on ``load_data``, but the format is more
    convenient for use in our implementation of neural networks.

    In particular, ``training_data`` is a list containing 50,000
    2-tuples ``(x, y)``.  ``x`` is a 784-dimensional numpy.ndarray
    containing the input image.  ``y`` is a 10-dimensional
    numpy.ndarray representing the unit vector corresponding to the
    correct digit for ``x``.

    ``validation_data`` and ``test_data`` are lists containing 10,000
    2-tuples ``(x, y)``.  In each case, ``x`` is a 784-dimensional
    numpy.ndarry containing the input image, and ``y`` is the
    corresponding classification, i.e., the digit values (integers)
    corresponding to ``x``.

    Obviously, this means we're using slightly different formats for
    the training data and the validation / test data.  These formats
    turn out to be the most convenient for use in our neural network
    code."""
    tr_d, va_d, te_d = load_data()
    training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
    training_results = [vectorized_result(y) for y in tr_d[1]]
    training_data = zip(training_inputs, training_results)
    validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]
    validation_data = zip(validation_inputs, va_d[1])
    test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
    test_data = zip(test_inputs, te_d[1])
    return (training_data, validation_data, test_data)

def vectorized_result(j):
    """Return a 10-dimensional unit vector with a 1.0 in the jth
    position and zeroes elsewhere.  This is used to convert a digit
    (0...9) into a corresponding desired output from the neural
    network."""
    e = np.zeros((10, 1))
    e[j] = 1.0
    return e

In [19]:
import sys

# My library

# Third-party libraries
import matplotlib
import matplotlib.pyplot as plt
import numpy as np


def plot_10_by_10_images(images):
    """ Plot 100 MNIST images in a 10 by 10 table. Note that we crop
    the images so that they appear reasonably close together.  The
    image is post-processed to give the appearance of being continued."""
    fig = plt.figure()
    images = [image[3:25, 3:25] for image in images]
    #image = np.concatenate(images, axis=1)
    for x in range(10):
        for y in range(10):
            ax = fig.add_subplot(10, 10, 10*y+x)
            ax.matshow(images[10*y+x], cmap = matplotlib.cm.binary)
            plt.xticks(np.array([]))
            plt.yticks(np.array([]))
    plt.show()

def get_images(training_set):
    """ Return a list containing the images from the MNIST data
    set. Each image is represented as a 2-d numpy array."""
    flattened_images = training_set[0]
    return [np.reshape(f, (-1, 28)) for f in flattened_images]


training_set, validation_set, test_set = load_data()
images = get_images(training_set)

The MNIST dataset is a set of number as on the next picture. Our goal is to learn to classify the number : from 0 to 9, with a neural network. At the beginning we will use the quadratic cost function and we will add some regularization to have networks which are performing better.


In [20]:
plot_10_by_10_images(images)


We will plot the accuracy of the classification for several number of hidden neurons, with some vlaues of the learning parameters. We will then compare the performance for the training data and the evaluation data.


In [8]:
import json
import random
import sys

# Third-party libraries
import numpy as np


#### Define the quadratic and cross-entropy cost functions

class QuadraticCost(object):

    @staticmethod
    def fn(a, y):
        """Return the cost associated with an output ``a`` and desired output
        ``y``.

        """
        return 0.5*np.linalg.norm(a-y)**2

    @staticmethod
    def delta(z, a, y):
        """Return the error delta from the output layer."""
        return (a-y) * sigmoid_prime(z)


#### Main Network class
class Network(object):

    def __init__(self, sizes, cost=QuadraticCost):
        """The list ``sizes`` contains the number of neurons in the respective
        layers of the network.  For example, if the list was [2, 3, 1]
        then it would be a three-layer network, with the first layer
        containing 2 neurons, the second layer 3 neurons, and the
        third layer 1 neuron.  The biases and weights for the network
        are initialized randomly        

        """
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.large_weight_initializer()
        self.cost=cost

    def large_weight_initializer(self):
        """Initialize the weights using a Gaussian distribution with mean 0
        and standard deviation 1.  Initialize the biases using a
        Gaussian distribution with mean 0 and standard deviation 1.



        """
        self.biases = [np.random.randn(y, 1) for y in self.sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(self.sizes[:-1], self.sizes[1:])]

    def feedforward(self, a):
        """Return the output of the network if ``a`` is input."""
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a)+b)
        return a

    def SGD(self, training_data, epochs, mini_batch_size, eta,
            lmbda = 0.0,
            evaluation_data=None,
            monitor_evaluation_cost=False,
            monitor_evaluation_accuracy=False,
            monitor_training_cost=False,
            monitor_training_accuracy=False):
        """Train the neural network using mini-batch stochastic gradient
        descent.  The ``training_data`` is a list of tuples ``(x, y)``
        representing the training inputs and the desired outputs.  The
        other non-optional parameters are self-explanatory, as is the
        regularization parameter ``lmbda``.  The method also accepts
        ``evaluation_data``, usually either the validation or test
        data.  We can monitor the cost and accuracy on either the
        evaluation data or the training data, by setting the
        appropriate flags.  The method returns a tuple containing four
        lists: the (per-epoch) costs on the evaluation data, the
        accuracies on the evaluation data, the costs on the training
        data, and the accuracies on the training data.  All values are
        evaluated at the end of each training epoch.  So, for example,
        if we train for 30 epochs, then the first element of the tuple
        will be a 30-element list containing the cost on the
        evaluation data at the end of each epoch. Note that the lists
        are empty if the corresponding flag is not set.

        """
        if evaluation_data: n_data = len(evaluation_data)
        n = len(training_data)
        evaluation_cost, evaluation_accuracy = [], []
        training_cost, training_accuracy = [], []
        for j in xrange(epochs):
            random.shuffle(training_data)
            mini_batches = [
                training_data[k:k+mini_batch_size]
                for k in xrange(0, n, mini_batch_size)]
            for mini_batch in mini_batches:
                self.update_mini_batch(
                    mini_batch, eta, lmbda, len(training_data))
            print "Epoch %s training complete" % j
            if monitor_training_cost:
                cost = self.total_cost(training_data, lmbda)
                training_cost.append(cost)
                print "Cost on training data: {}".format(cost)
            if monitor_training_accuracy:
                accuracy = self.accuracy(training_data, convert=True)
                training_accuracy.append(accuracy)
                print "Accuracy on training data: {} / {}".format(
                    accuracy, n)
            if monitor_evaluation_cost:
                cost = self.total_cost(evaluation_data, lmbda, convert=True)
                evaluation_cost.append(cost)
                print "Cost on evaluation data: {}".format(cost)
            if monitor_evaluation_accuracy:
                accuracy = self.accuracy(evaluation_data)
                evaluation_accuracy.append(accuracy)
                print "Accuracy on evaluation data: {} / {}".format(
                    self.accuracy(evaluation_data), n_data)
            print
        return evaluation_cost, evaluation_accuracy, \
            training_cost, training_accuracy

    def update_mini_batch(self, mini_batch, eta, lmbda, n):
        """Update the network's weights and biases by applying gradient
        descent using backpropagation to a single mini batch.  The
        ``mini_batch`` is a list of tuples ``(x, y)``, ``eta`` is the
        learning rate, ``lmbda`` is the regularization parameter, and
        ``n`` is the total size of the training data set.

        """
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        self.weights = [(1-eta*(lmbda/n))*w-(eta/len(mini_batch))*nw
                        for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb
                       for b, nb in zip(self.biases, nabla_b)]

    def backprop(self, x, y):
        """Return a tuple ``(nabla_b, nabla_w)`` representing the
        gradient for the cost function C_x.  ``nabla_b`` and
        ``nabla_w`` are layer-by-layer lists of numpy arrays, similar
        to ``self.biases`` and ``self.weights``."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        # feedforward
        activation = x
        activations = [x] # list to store all the activations, layer by layer
        zs = [] # list to store all the z vectors, layer by layer
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation)+b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        # backward pass
        delta = (self.cost).delta(zs[-1], activations[-1], y)
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
       
        for l in xrange(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
        return (nabla_b, nabla_w)

    def accuracy(self, data, convert=False):
        """Return the number of inputs in ``data`` for which the neural
        network outputs the correct result. The neural network's
        output is assumed to be the index of whichever neuron in the
        final layer has the highest activation.

        The flag ``convert`` should be set to False if the data set is
        validation or test data (the usual case), and to True if the
        data set is the training data. The need for this flag arises
        due to differences in the way the results ``y`` are
        represented in the different data sets.  In particular, it
        flags whether we need to convert between the different
        representations.
       
        """
        if convert:
            results = [(np.argmax(self.feedforward(x)), np.argmax(y))
                       for (x, y) in data]
        else:
            results = [(np.argmax(self.feedforward(x)), y)
                        for (x, y) in data]
        return sum(int(x == y) for (x, y) in results)

    def total_cost(self, data, lmbda, convert=False):
        """Return the total cost for the data set ``data``.  The flag
        ``convert`` should be set to False if the data set is the
        training data (the usual case), and to True if the data set is
        the validation or test data.  
        """
        cost = 0.0
        for x, y in data:
            a = self.feedforward(x)
            if convert: y = vectorized_result(y)
            cost += self.cost.fn(a, y)/len(data)
        cost += 0.5*(lmbda/len(data))*sum(
            np.linalg.norm(w)**2 for w in self.weights)
        return cost

    def save(self, filename):
        """Save the neural network to the file ``filename``."""
        data = {"sizes": self.sizes,
                "weights": [w.tolist() for w in self.weights],
                "biases": [b.tolist() for b in self.biases],
                "cost": str(self.cost.__name__)}
        f = open(filename, "w")
        json.dump(data, f)
        f.close()

#### Loading a Network
def load(filename):
    """Load a neural network from the file ``filename``.  
    """
    f = open(filename, "r")
    data = json.load(f)
    f.close()
    cost = getattr(sys.modules[__name__], data["cost"])
    net = Network(data["sizes"], cost=cost)
    net.weights = [np.array(w) for w in data["weights"]]
    net.biases = [np.array(b) for b in data["biases"]]
    return net

#### Miscellaneous functions
def vectorized_result(j):
    """Return a 10-dimensional unit vector with a 1.0 in the j'th position
    and zeroes elsewhere.  This is used to convert a digit (0...9)
    into a corresponding desired output from the neural network.

    """
    e = np.zeros((10, 1))
    e[j] = 1.0
    return e

def sigmoid(z):
    """The sigmoid function."""
    return 1.0/(1.0+np.exp(-z))

def sigmoid_prime(z):
    """Derivative of the sigmoid function."""
    return sigmoid(z)*(1-sigmoid(z))

In [5]:
training_data, validation_data, test_data = load_data_wrapper()

In [6]:
def plot_test_accuracy(test_accuracy, num_epochs, test_accuracy_xmin):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.plot(np.arange(test_accuracy_xmin, num_epochs), 
            [accuracy/100.0 
             for accuracy in test_accuracy[test_accuracy_xmin:num_epochs]],
            color='#2A6EA6')
    ax.set_xlim([test_accuracy_xmin, num_epochs])
    ax.grid(True)
    ax.set_xlabel('Epoch')
    ax.set_title('Accuracy (%) on the test data')
    plt.show()

    
def plot_training_accuracy(training_accuracy, num_epochs, 
                           training_accuracy_xmin, training_set_size):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.plot(np.arange(training_accuracy_xmin, num_epochs), 
            [accuracy*100.0/training_set_size 
             for accuracy in training_accuracy[training_accuracy_xmin:num_epochs]],
            color='#2A6EA6')
    ax.set_xlim([training_accuracy_xmin, num_epochs])
    ax.grid(True)
    ax.set_xlabel('Epoch')
    ax.set_title('Accuracy (%) on the training data')
    plt.show()

In [9]:
M_list=[10,30,50,100,200]
net_list=[]
data_list=[]

for i,M in enumerate(M_list):
    f = open('results_totaux'+str(M), "w")
    net = Network([784, M, 10])
    net.large_weight_initializer()
    test_cost, test_accuracy, training_cost, training_accuracy = net.SGD(training_data[:50000], 40, 10, 0.5,
                  evaluation_data=test_data, lmbda = 5.0,
                  monitor_evaluation_cost=True, 
                  monitor_evaluation_accuracy=True, 
                  monitor_training_cost=True, 
                  monitor_training_accuracy=True)
    data_list.append((test_cost, test_accuracy, training_cost, training_accuracy))
    net_list.append(net)
    net.save('nnM_'+str(M))
    json.dump([M,test_cost, test_accuracy, training_cost, training_accuracy], f)
f.close()


Epoch 0 training complete
Cost on training data: 0.458176078172
Accuracy on training data: 36812 / 50000
Cost on evaluation data: 1.44022357202
Accuracy on evaluation data: 7430 / 10000

Epoch 1 training complete
Cost on training data: 0.281221002679
Accuracy on training data: 43180 / 50000
Cost on evaluation data: 0.91991232953
Accuracy on evaluation data: 8743 / 10000

Epoch 2 training complete
Cost on training data: 0.204841836014
Accuracy on training data: 44604 / 50000
Cost on evaluation data: 0.634605375559
Accuracy on evaluation data: 8940 / 10000

Epoch 3 training complete
Cost on training data: 0.163318295699
Accuracy on training data: 45156 / 50000
Cost on evaluation data: 0.466455928416
Accuracy on evaluation data: 9059 / 10000

Epoch 4 training complete
Cost on training data: 0.140869242011
Accuracy on training data: 45360 / 50000
Cost on evaluation data: 0.369625647154
Accuracy on evaluation data: 9066 / 10000

Epoch 5 training complete
Cost on training data: 0.126317876126
Accuracy on training data: 45633 / 50000
Cost on evaluation data: 0.311990699019
Accuracy on evaluation data: 9122 / 10000

Epoch 6 training complete
Cost on training data: 0.117324127131
Accuracy on training data: 45876 / 50000
Cost on evaluation data: 0.278120691458
Accuracy on evaluation data: 9148 / 10000

Epoch 7 training complete
Cost on training data: 0.11206942944
Accuracy on training data: 45936 / 50000
Cost on evaluation data: 0.257439992898
Accuracy on evaluation data: 9189 / 10000

Epoch 8 training complete
Cost on training data: 0.113039695857
Accuracy on training data: 45744 / 50000
Cost on evaluation data: 0.249628311141
Accuracy on evaluation data: 9114 / 10000

Epoch 9 training complete
Cost on training data: 0.108640666766
Accuracy on training data: 45943 / 50000
Cost on evaluation data: 0.240284087241
Accuracy on evaluation data: 9149 / 10000

Epoch 10 training complete
Cost on training data: 0.10535788065
Accuracy on training data: 46055 / 50000
Cost on evaluation data: 0.234105426217
Accuracy on evaluation data: 9193 / 10000

Epoch 11 training complete
Cost on training data: 0.106326665297
Accuracy on training data: 45889 / 50000
Cost on evaluation data: 0.232924869443
Accuracy on evaluation data: 9133 / 10000

Epoch 12 training complete
Cost on training data: 0.103644422529
Accuracy on training data: 46055 / 50000
Cost on evaluation data: 0.227524156675
Accuracy on evaluation data: 9216 / 10000

Epoch 13 training complete
Cost on training data: 0.103176490513
Accuracy on training data: 46004 / 50000
Cost on evaluation data: 0.226044718286
Accuracy on evaluation data: 9185 / 10000

Epoch 14 training complete
Cost on training data: 0.0993016460276
Accuracy on training data: 46373 / 50000
Cost on evaluation data: 0.221781664591
Accuracy on evaluation data: 9248 / 10000

Epoch 15 training complete
Cost on training data: 0.0994180143408
Accuracy on training data: 46274 / 50000
Cost on evaluation data: 0.221236250889
Accuracy on evaluation data: 9234 / 10000

Epoch 16 training complete
Cost on training data: 0.0984523798645
Accuracy on training data: 46350 / 50000
Cost on evaluation data: 0.220711163405
Accuracy on evaluation data: 9243 / 10000

Epoch 17 training complete
Cost on training data: 0.099120895361
Accuracy on training data: 46385 / 50000
Cost on evaluation data: 0.221334252211
Accuracy on evaluation data: 9231 / 10000

Epoch 18 training complete
Cost on training data: 0.0985488332723
Accuracy on training data: 46416 / 50000
Cost on evaluation data: 0.219862613732
Accuracy on evaluation data: 9277 / 10000

Epoch 19 training complete
Cost on training data: 0.0971619135548
Accuracy on training data: 46498 / 50000
Cost on evaluation data: 0.219946938941
Accuracy on evaluation data: 9273 / 10000

Epoch 20 training complete
Cost on training data: 0.101459514544
Accuracy on training data: 46164 / 50000
Cost on evaluation data: 0.225581803115
Accuracy on evaluation data: 9206 / 10000

Epoch 21 training complete
Cost on training data: 0.0985895657601
Accuracy on training data: 46431 / 50000
Cost on evaluation data: 0.222668014647
Accuracy on evaluation data: 9254 / 10000

Epoch 22 training complete
Cost on training data: 0.0966114639604
Accuracy on training data: 46556 / 50000
Cost on evaluation data: 0.220066075447
Accuracy on evaluation data: 9268 / 10000

Epoch 23 training complete
Cost on training data: 0.097960015684
Accuracy on training data: 46516 / 50000
Cost on evaluation data: 0.221492426188
Accuracy on evaluation data: 9265 / 10000

Epoch 24 training complete
Cost on training data: 0.0970031819138
Accuracy on training data: 46544 / 50000
Cost on evaluation data: 0.220764317831
Accuracy on evaluation data: 9294 / 10000

Epoch 25 training complete
Cost on training data: 0.0984711049332
Accuracy on training data: 46397 / 50000
Cost on evaluation data: 0.222090234315
Accuracy on evaluation data: 9268 / 10000

Epoch 26 training complete
Cost on training data: 0.097115023078
Accuracy on training data: 46454 / 50000
Cost on evaluation data: 0.220743047651
Accuracy on evaluation data: 9267 / 10000

Epoch 27 training complete
Cost on training data: 0.0969448932933
Accuracy on training data: 46487 / 50000
Cost on evaluation data: 0.220900214097
Accuracy on evaluation data: 9275 / 10000

Epoch 28 training complete
Cost on training data: 0.0984500530146
Accuracy on training data: 46401 / 50000
Cost on evaluation data: 0.22242485868
Accuracy on evaluation data: 9267 / 10000

Epoch 29 training complete
Cost on training data: 0.0961689650613
Accuracy on training data: 46572 / 50000
Cost on evaluation data: 0.220621504065
Accuracy on evaluation data: 9283 / 10000

Epoch 30 training complete
Cost on training data: 0.0957654938188
Accuracy on training data: 46563 / 50000
Cost on evaluation data: 0.219198218476
Accuracy on evaluation data: 9309 / 10000

Epoch 31 training complete
Cost on training data: 0.095581043104
Accuracy on training data: 46598 / 50000
Cost on evaluation data: 0.21926999935
Accuracy on evaluation data: 9299 / 10000

Epoch 32 training complete
Cost on training data: 0.0964234662673
Accuracy on training data: 46609 / 50000
Cost on evaluation data: 0.220722961849
Accuracy on evaluation data: 9291 / 10000

Epoch 33 training complete
Cost on training data: 0.0963648257377
Accuracy on training data: 46562 / 50000
Cost on evaluation data: 0.220408780274
Accuracy on evaluation data: 9304 / 10000

Epoch 34 training complete
Cost on training data: 0.0973188677453
Accuracy on training data: 46562 / 50000
Cost on evaluation data: 0.221372816174
Accuracy on evaluation data: 9277 / 10000

Epoch 35 training complete
Cost on training data: 0.0971380123591
Accuracy on training data: 46593 / 50000
Cost on evaluation data: 0.220826604777
Accuracy on evaluation data: 9300 / 10000

Epoch 36 training complete
Cost on training data: 0.0986762771196
Accuracy on training data: 46406 / 50000
Cost on evaluation data: 0.222855498689
Accuracy on evaluation data: 9242 / 10000

Epoch 37 training complete
Cost on training data: 0.0973346131584
Accuracy on training data: 46456 / 50000
Cost on evaluation data: 0.220982695377
Accuracy on evaluation data: 9263 / 10000

Epoch 38 training complete
Cost on training data: 0.0950144979779
Accuracy on training data: 46626 / 50000
Cost on evaluation data: 0.219333546448
Accuracy on evaluation data: 9290 / 10000

Epoch 39 training complete
Cost on training data: 0.0966498271659
Accuracy on training data: 46527 / 50000
Cost on evaluation data: 0.220543735769
Accuracy on evaluation data: 9289 / 10000

Epoch 0 training complete
Cost on training data: 0.861972407897
Accuracy on training data: 41561 / 50000
Cost on evaluation data: 3.74341092706
Accuracy on evaluation data: 8422 / 10000

Epoch 1 training complete
Cost on training data: 0.532568164198
Accuracy on training data: 44874 / 50000
Cost on evaluation data: 2.30560909986
Accuracy on evaluation data: 9034 / 10000

Epoch 2 training complete
Cost on training data: 0.347967897447
Accuracy on training data: 45978 / 50000
Cost on evaluation data: 1.45271513159
Accuracy on evaluation data: 9195 / 10000

Epoch 3 training complete
Cost on training data: 0.241111285404
Accuracy on training data: 46417 / 50000
Cost on evaluation data: 0.94451101609
Accuracy on evaluation data: 9276 / 10000

Epoch 4 training complete
Cost on training data: 0.175132117103
Accuracy on training data: 46810 / 50000
Cost on evaluation data: 0.639558383129
Accuracy on evaluation data: 9351 / 10000

Epoch 5 training complete
Cost on training data: 0.136952208313
Accuracy on training data: 47030 / 50000
Cost on evaluation data: 0.459762452294
Accuracy on evaluation data: 9395 / 10000

Epoch 6 training complete
Cost on training data: 0.114002041574
Accuracy on training data: 47159 / 50000
Cost on evaluation data: 0.352571951238
Accuracy on evaluation data: 9428 / 10000

Epoch 7 training complete
Cost on training data: 0.100347902067
Accuracy on training data: 47307 / 50000
Cost on evaluation data: 0.289615736281
Accuracy on evaluation data: 9439 / 10000

Epoch 8 training complete
Cost on training data: 0.0922315066948
Accuracy on training data: 47370 / 50000
Cost on evaluation data: 0.252520602816
Accuracy on evaluation data: 9460 / 10000

Epoch 9 training complete
Cost on training data: 0.0863678740554
Accuracy on training data: 47467 / 50000
Cost on evaluation data: 0.231190881878
Accuracy on evaluation data: 9463 / 10000

Epoch 10 training complete
Cost on training data: 0.0831204583087
Accuracy on training data: 47564 / 50000
Cost on evaluation data: 0.21745725461
Accuracy on evaluation data: 9500 / 10000

Epoch 11 training complete
Cost on training data: 0.0801729061994
Accuracy on training data: 47626 / 50000
Cost on evaluation data: 0.208382499936
Accuracy on evaluation data: 9529 / 10000

Epoch 12 training complete
Cost on training data: 0.0811791181724
Accuracy on training data: 47561 / 50000
Cost on evaluation data: 0.206550459741
Accuracy on evaluation data: 9488 / 10000

Epoch 13 training complete
Cost on training data: 0.0791646781201
Accuracy on training data: 47690 / 50000
Cost on evaluation data: 0.203173837859
Accuracy on evaluation data: 9513 / 10000

Epoch 14 training complete
Cost on training data: 0.0776566746861
Accuracy on training data: 47720 / 50000
Cost on evaluation data: 0.200740071551
Accuracy on evaluation data: 9527 / 10000

Epoch 15 training complete
Cost on training data: 0.0772951164629
Accuracy on training data: 47712 / 50000
Cost on evaluation data: 0.200193221567
Accuracy on evaluation data: 9520 / 10000

Epoch 16 training complete
Cost on training data: 0.0772995107332
Accuracy on training data: 47766 / 50000
Cost on evaluation data: 0.199810866321
Accuracy on evaluation data: 9528 / 10000

Epoch 17 training complete
Cost on training data: 0.0775021217657
Accuracy on training data: 47726 / 50000
Cost on evaluation data: 0.200070967512
Accuracy on evaluation data: 9534 / 10000

Epoch 18 training complete
Cost on training data: 0.0771945401466
Accuracy on training data: 47757 / 50000
Cost on evaluation data: 0.199993684278
Accuracy on evaluation data: 9550 / 10000

Epoch 19 training complete
Cost on training data: 0.0759150843321
Accuracy on training data: 47828 / 50000
Cost on evaluation data: 0.198601386966
Accuracy on evaluation data: 9569 / 10000

Epoch 20 training complete
Cost on training data: 0.0764031169294
Accuracy on training data: 47861 / 50000
Cost on evaluation data: 0.199598023093
Accuracy on evaluation data: 9563 / 10000

Epoch 21 training complete
Cost on training data: 0.075715100415
Accuracy on training data: 47910 / 50000
Cost on evaluation data: 0.199202631425
Accuracy on evaluation data: 9566 / 10000

Epoch 22 training complete
Cost on training data: 0.075320141174
Accuracy on training data: 47905 / 50000
Cost on evaluation data: 0.198714047496
Accuracy on evaluation data: 9570 / 10000

Epoch 23 training complete
Cost on training data: 0.0753163217794
Accuracy on training data: 47932 / 50000
Cost on evaluation data: 0.198910555187
Accuracy on evaluation data: 9585 / 10000

Epoch 24 training complete
Cost on training data: 0.0755254211847
Accuracy on training data: 47929 / 50000
Cost on evaluation data: 0.199658312355
Accuracy on evaluation data: 9568 / 10000

Epoch 25 training complete
Cost on training data: 0.0750626026167
Accuracy on training data: 47920 / 50000
Cost on evaluation data: 0.199156690688
Accuracy on evaluation data: 9563 / 10000

Epoch 26 training complete
Cost on training data: 0.0765099822539
Accuracy on training data: 47877 / 50000
Cost on evaluation data: 0.201439129371
Accuracy on evaluation data: 9534 / 10000

Epoch 27 training complete
Cost on training data: 0.0763954026594
Accuracy on training data: 47910 / 50000
Cost on evaluation data: 0.200725914178
Accuracy on evaluation data: 9562 / 10000

Epoch 28 training complete
Cost on training data: 0.0752290225153
Accuracy on training data: 47917 / 50000
Cost on evaluation data: 0.199159421223
Accuracy on evaluation data: 9572 / 10000

Epoch 29 training complete
Cost on training data: 0.0752334968892
Accuracy on training data: 47967 / 50000
Cost on evaluation data: 0.199907549312
Accuracy on evaluation data: 9566 / 10000

Epoch 30 training complete
Cost on training data: 0.0751209064036
Accuracy on training data: 47953 / 50000
Cost on evaluation data: 0.199873006303
Accuracy on evaluation data: 9580 / 10000

Epoch 31 training complete
Cost on training data: 0.0749644480154
Accuracy on training data: 47925 / 50000
Cost on evaluation data: 0.198845619785
Accuracy on evaluation data: 9583 / 10000

Epoch 32 training complete
Cost on training data: 0.0754221961682
Accuracy on training data: 47955 / 50000
Cost on evaluation data: 0.199869296726
Accuracy on evaluation data: 9566 / 10000

Epoch 33 training complete
Cost on training data: 0.0750306021252
Accuracy on training data: 47964 / 50000
Cost on evaluation data: 0.199907190653
Accuracy on evaluation data: 9592 / 10000

Epoch 34 training complete
Cost on training data: 0.0752249631154
Accuracy on training data: 47970 / 50000
Cost on evaluation data: 0.19989491641
Accuracy on evaluation data: 9558 / 10000

Epoch 35 training complete
Cost on training data: 0.0753670418496
Accuracy on training data: 47926 / 50000
Cost on evaluation data: 0.200089312469
Accuracy on evaluation data: 9552 / 10000

Epoch 36 training complete
Cost on training data: 0.0742837506319
Accuracy on training data: 47965 / 50000
Cost on evaluation data: 0.199312727186
Accuracy on evaluation data: 9582 / 10000

Epoch 37 training complete
Cost on training data: 0.0762079994807
Accuracy on training data: 47913 / 50000
Cost on evaluation data: 0.200660374682
Accuracy on evaluation data: 9566 / 10000

Epoch 38 training complete
Cost on training data: 0.0754397993785
Accuracy on training data: 47957 / 50000
Cost on evaluation data: 0.200308081015
Accuracy on evaluation data: 9565 / 10000

Epoch 39 training complete
Cost on training data: 0.0741362406536
Accuracy on training data: 48011 / 50000
Cost on evaluation data: 0.198877400116
Accuracy on evaluation data: 9583 / 10000

Epoch 0 training complete
Cost on training data: 1.42975414854
Accuracy on training data: 32417 / 50000
Cost on evaluation data: 6.24144134801
Accuracy on evaluation data: 6647 / 10000

Epoch 1 training complete
Cost on training data: 0.910308561491
Accuracy on training data: 36094 / 50000
Cost on evaluation data: 3.84473197607
Accuracy on evaluation data: 7307 / 10000

Epoch 2 training complete
Cost on training data: 0.56849787671
Accuracy on training data: 41534 / 50000
Cost on evaluation data: 2.37330326207
Accuracy on evaluation data: 8371 / 10000

Epoch 3 training complete
Cost on training data: 0.345551017571
Accuracy on training data: 46634 / 50000
Cost on evaluation data: 1.47402678624
Accuracy on evaluation data: 9344 / 10000

Epoch 4 training complete
Cost on training data: 0.237854178947
Accuracy on training data: 47031 / 50000
Cost on evaluation data: 0.959674395369
Accuracy on evaluation data: 9395 / 10000

Epoch 5 training complete
Cost on training data: 0.172009707577
Accuracy on training data: 47322 / 50000
Cost on evaluation data: 0.650655939764
Accuracy on evaluation data: 9461 / 10000

Epoch 6 training complete
Cost on training data: 0.132925623166
Accuracy on training data: 47493 / 50000
Cost on evaluation data: 0.466808263836
Accuracy on evaluation data: 9509 / 10000

Epoch 7 training complete
Cost on training data: 0.110514565095
Accuracy on training data: 47619 / 50000
Cost on evaluation data: 0.359148303473
Accuracy on evaluation data: 9508 / 10000

Epoch 8 training complete
Cost on training data: 0.0958548429406
Accuracy on training data: 47687 / 50000
Cost on evaluation data: 0.294018448783
Accuracy on evaluation data: 9533 / 10000

Epoch 9 training complete
Cost on training data: 0.0880799484647
Accuracy on training data: 47787 / 50000
Cost on evaluation data: 0.256567128188
Accuracy on evaluation data: 9554 / 10000

Epoch 10 training complete
Cost on training data: 0.0826109115797
Accuracy on training data: 47843 / 50000
Cost on evaluation data: 0.233368397029
Accuracy on evaluation data: 9554 / 10000

Epoch 11 training complete
Cost on training data: 0.079311146049
Accuracy on training data: 47903 / 50000
Cost on evaluation data: 0.220708421984
Accuracy on evaluation data: 9559 / 10000

Epoch 12 training complete
Cost on training data: 0.0771986025677
Accuracy on training data: 47960 / 50000
Cost on evaluation data: 0.211757232305
Accuracy on evaluation data: 9572 / 10000

Epoch 13 training complete
Cost on training data: 0.0753410207516
Accuracy on training data: 47970 / 50000
Cost on evaluation data: 0.207046731324
Accuracy on evaluation data: 9577 / 10000

Epoch 14 training complete
Cost on training data: 0.0745022850736
Accuracy on training data: 48028 / 50000
Cost on evaluation data: 0.20420411345
Accuracy on evaluation data: 9570 / 10000

Epoch 15 training complete
Cost on training data: 0.075408520212
Accuracy on training data: 47953 / 50000
Cost on evaluation data: 0.204352501768
Accuracy on evaluation data: 9566 / 10000

Epoch 16 training complete
Cost on training data: 0.075624367312
Accuracy on training data: 48035 / 50000
Cost on evaluation data: 0.204304837207
Accuracy on evaluation data: 9581 / 10000

Epoch 17 training complete
Cost on training data: 0.072595432028
Accuracy on training data: 48116 / 50000
Cost on evaluation data: 0.200904940659
Accuracy on evaluation data: 9611 / 10000

Epoch 18 training complete
Cost on training data: 0.0730388720584
Accuracy on training data: 48153 / 50000
Cost on evaluation data: 0.201183508378
Accuracy on evaluation data: 9618 / 10000

Epoch 19 training complete
Cost on training data: 0.0720846596894
Accuracy on training data: 48206 / 50000
Cost on evaluation data: 0.200106576148
Accuracy on evaluation data: 9618 / 10000

Epoch 20 training complete
Cost on training data: 0.0725805231683
Accuracy on training data: 48189 / 50000
Cost on evaluation data: 0.200327609924
Accuracy on evaluation data: 9623 / 10000

Epoch 21 training complete
Cost on training data: 0.0719251543502
Accuracy on training data: 48207 / 50000
Cost on evaluation data: 0.200731208768
Accuracy on evaluation data: 9616 / 10000

Epoch 22 training complete
Cost on training data: 0.0712784529354
Accuracy on training data: 48186 / 50000
Cost on evaluation data: 0.200004553185
Accuracy on evaluation data: 9612 / 10000

Epoch 23 training complete
Cost on training data: 0.0735629009285
Accuracy on training data: 48173 / 50000
Cost on evaluation data: 0.201596061373
Accuracy on evaluation data: 9612 / 10000

Epoch 24 training complete
Cost on training data: 0.0712134892791
Accuracy on training data: 48196 / 50000
Cost on evaluation data: 0.199991505183
Accuracy on evaluation data: 9629 / 10000

Epoch 25 training complete
Cost on training data: 0.0706783613569
Accuracy on training data: 48258 / 50000
Cost on evaluation data: 0.199617815737
Accuracy on evaluation data: 9633 / 10000

Epoch 26 training complete
Cost on training data: 0.0732257162245
Accuracy on training data: 48183 / 50000
Cost on evaluation data: 0.202294906206
Accuracy on evaluation data: 9608 / 10000

Epoch 27 training complete
Cost on training data: 0.0717474778501
Accuracy on training data: 48210 / 50000
Cost on evaluation data: 0.200933006453
Accuracy on evaluation data: 9616 / 10000

Epoch 28 training complete
Cost on training data: 0.0723750189725
Accuracy on training data: 48178 / 50000
Cost on evaluation data: 0.201218037601
Accuracy on evaluation data: 9604 / 10000

Epoch 29 training complete
Cost on training data: 0.0708755072604
Accuracy on training data: 48252 / 50000
Cost on evaluation data: 0.200079627289
Accuracy on evaluation data: 9614 / 10000

Epoch 30 training complete
Cost on training data: 0.0705442536435
Accuracy on training data: 48279 / 50000
Cost on evaluation data: 0.199519019952
Accuracy on evaluation data: 9633 / 10000

Epoch 31 training complete
Cost on training data: 0.0701940372347
Accuracy on training data: 48281 / 50000
Cost on evaluation data: 0.199399160677
Accuracy on evaluation data: 9630 / 10000

Epoch 32 training complete
Cost on training data: 0.0701449924563
Accuracy on training data: 48308 / 50000
Cost on evaluation data: 0.199441479284
Accuracy on evaluation data: 9644 / 10000

Epoch 33 training complete
Cost on training data: 0.0720386431053
Accuracy on training data: 48211 / 50000
Cost on evaluation data: 0.20223108616
Accuracy on evaluation data: 9603 / 10000

Epoch 34 training complete
Cost on training data: 0.0709264962312
Accuracy on training data: 48252 / 50000
Cost on evaluation data: 0.200349037123
Accuracy on evaluation data: 9617 / 10000

Epoch 35 training complete
Cost on training data: 0.0712024018311
Accuracy on training data: 48294 / 50000
Cost on evaluation data: 0.200689912517
Accuracy on evaluation data: 9622 / 10000

Epoch 36 training complete
Cost on training data: 0.0701742957932
Accuracy on training data: 48312 / 50000
Cost on evaluation data: 0.19970797337
Accuracy on evaluation data: 9628 / 10000

Epoch 37 training complete
Cost on training data: 0.0701860649665
Accuracy on training data: 48315 / 50000
Cost on evaluation data: 0.200010118823
Accuracy on evaluation data: 9630 / 10000

Epoch 38 training complete
Cost on training data: 0.0706488623752
Accuracy on training data: 48361 / 50000
Cost on evaluation data: 0.200078220573
Accuracy on evaluation data: 9644 / 10000

Epoch 39 training complete
Cost on training data: 0.070427577259
Accuracy on training data: 48324 / 50000
Cost on evaluation data: 0.199919749971
Accuracy on evaluation data: 9640 / 10000

Epoch 0 training complete
Cost on training data: 2.63082244986
Accuracy on training data: 32464 / 50000
Cost on evaluation data: 12.2513770989
Accuracy on evaluation data: 6454 / 10000

Epoch 1 training complete
Cost on training data: 1.58514642792
Accuracy on training data: 41236 / 50000
Cost on evaluation data: 7.42992072926
Accuracy on evaluation data: 8236 / 10000

Epoch 2 training complete
Cost on training data: 0.969289074322
Accuracy on training data: 45935 / 50000
Cost on evaluation data: 4.53501921371
Accuracy on evaluation data: 9150 / 10000

Epoch 3 training complete
Cost on training data: 0.606697493418
Accuracy on training data: 46870 / 50000
Cost on evaluation data: 2.80041310355
Accuracy on evaluation data: 9361 / 10000

Epoch 4 training complete
Cost on training data: 0.395439380806
Accuracy on training data: 47268 / 50000
Cost on evaluation data: 1.7632826142
Accuracy on evaluation data: 9437 / 10000

Epoch 5 training complete
Cost on training data: 0.266628750447
Accuracy on training data: 47609 / 50000
Cost on evaluation data: 1.13851632599
Accuracy on evaluation data: 9500 / 10000

Epoch 6 training complete
Cost on training data: 0.188730778588
Accuracy on training data: 47777 / 50000
Cost on evaluation data: 0.763274659086
Accuracy on evaluation data: 9523 / 10000

Epoch 7 training complete
Cost on training data: 0.142198868502
Accuracy on training data: 47968 / 50000
Cost on evaluation data: 0.53907604704
Accuracy on evaluation data: 9562 / 10000

Epoch 8 training complete
Cost on training data: 0.115330619524
Accuracy on training data: 47998 / 50000
Cost on evaluation data: 0.406323012551
Accuracy on evaluation data: 9562 / 10000

Epoch 9 training complete
Cost on training data: 0.0971880357973
Accuracy on training data: 48121 / 50000
Cost on evaluation data: 0.324442897345
Accuracy on evaluation data: 9593 / 10000

Epoch 10 training complete
Cost on training data: 0.0885962988981
Accuracy on training data: 48099 / 50000
Cost on evaluation data: 0.278440845143
Accuracy on evaluation data: 9590 / 10000

Epoch 11 training complete
Cost on training data: 0.0809856295121
Accuracy on training data: 48236 / 50000
Cost on evaluation data: 0.248255848371
Accuracy on evaluation data: 9609 / 10000

Epoch 12 training complete
Cost on training data: 0.0775452027918
Accuracy on training data: 48218 / 50000
Cost on evaluation data: 0.231598473601
Accuracy on evaluation data: 9600 / 10000

Epoch 13 training complete
Cost on training data: 0.077244848003
Accuracy on training data: 48159 / 50000
Cost on evaluation data: 0.22315586015
Accuracy on evaluation data: 9587 / 10000

Epoch 14 training complete
Cost on training data: 0.0738529871425
Accuracy on training data: 48254 / 50000
Cost on evaluation data: 0.214774175084
Accuracy on evaluation data: 9621 / 10000

Epoch 15 training complete
Cost on training data: 0.0726153160714
Accuracy on training data: 48342 / 50000
Cost on evaluation data: 0.211553016072
Accuracy on evaluation data: 9635 / 10000

Epoch 16 training complete
Cost on training data: 0.0719727225417
Accuracy on training data: 48294 / 50000
Cost on evaluation data: 0.209514502505
Accuracy on evaluation data: 9611 / 10000

Epoch 17 training complete
Cost on training data: 0.073803451792
Accuracy on training data: 48273 / 50000
Cost on evaluation data: 0.210487752919
Accuracy on evaluation data: 9603 / 10000

Epoch 18 training complete
Cost on training data: 0.0710905917012
Accuracy on training data: 48329 / 50000
Cost on evaluation data: 0.206396692066
Accuracy on evaluation data: 9624 / 10000

Epoch 19 training complete
Cost on training data: 0.0696140295526
Accuracy on training data: 48429 / 50000
Cost on evaluation data: 0.204686285756
Accuracy on evaluation data: 9643 / 10000

Epoch 20 training complete
Cost on training data: 0.070000267812
Accuracy on training data: 48366 / 50000
Cost on evaluation data: 0.205466192185
Accuracy on evaluation data: 9616 / 10000

Epoch 21 training complete
Cost on training data: 0.0690897146766
Accuracy on training data: 48453 / 50000
Cost on evaluation data: 0.204200032141
Accuracy on evaluation data: 9640 / 10000

Epoch 22 training complete
Cost on training data: 0.0693821300692
Accuracy on training data: 48435 / 50000
Cost on evaluation data: 0.204425696398
Accuracy on evaluation data: 9643 / 10000

Epoch 23 training complete
Cost on training data: 0.0693532724089
Accuracy on training data: 48392 / 50000
Cost on evaluation data: 0.204748074206
Accuracy on evaluation data: 9639 / 10000

Epoch 24 training complete
Cost on training data: 0.0703227041671
Accuracy on training data: 48413 / 50000
Cost on evaluation data: 0.20587772223
Accuracy on evaluation data: 9641 / 10000

Epoch 25 training complete
Cost on training data: 0.0687386332413
Accuracy on training data: 48478 / 50000
Cost on evaluation data: 0.204163597377
Accuracy on evaluation data: 9664 / 10000

Epoch 26 training complete
Cost on training data: 0.0684178219363
Accuracy on training data: 48545 / 50000
Cost on evaluation data: 0.203875362307
Accuracy on evaluation data: 9659 / 10000

Epoch 27 training complete
Cost on training data: 0.0694916746048
Accuracy on training data: 48476 / 50000
Cost on evaluation data: 0.204922591077
Accuracy on evaluation data: 9647 / 10000

Epoch 28 training complete
Cost on training data: 0.0695106454
Accuracy on training data: 48479 / 50000
Cost on evaluation data: 0.205078463502
Accuracy on evaluation data: 9646 / 10000

Epoch 29 training complete
Cost on training data: 0.0696386635033
Accuracy on training data: 48447 / 50000
Cost on evaluation data: 0.205554437146
Accuracy on evaluation data: 9639 / 10000

Epoch 30 training complete
Cost on training data: 0.0709283825484
Accuracy on training data: 48372 / 50000
Cost on evaluation data: 0.206791436247
Accuracy on evaluation data: 9634 / 10000

Epoch 31 training complete
Cost on training data: 0.0683367659826
Accuracy on training data: 48509 / 50000
Cost on evaluation data: 0.204377327604
Accuracy on evaluation data: 9655 / 10000

Epoch 32 training complete
Cost on training data: 0.0691175101122
Accuracy on training data: 48484 / 50000
Cost on evaluation data: 0.205084275755
Accuracy on evaluation data: 9662 / 10000

Epoch 33 training complete
Cost on training data: 0.0696022705428
Accuracy on training data: 48512 / 50000
Cost on evaluation data: 0.205342921393
Accuracy on evaluation data: 9642 / 10000

Epoch 34 training complete
Cost on training data: 0.0695198298452
Accuracy on training data: 48464 / 50000
Cost on evaluation data: 0.205704241859
Accuracy on evaluation data: 9644 / 10000

Epoch 35 training complete
Cost on training data: 0.0695102937703
Accuracy on training data: 48460 / 50000
Cost on evaluation data: 0.205579249348
Accuracy on evaluation data: 9640 / 10000

Epoch 36 training complete
Cost on training data: 0.0678409800366
Accuracy on training data: 48564 / 50000
Cost on evaluation data: 0.203939197127
Accuracy on evaluation data: 9666 / 10000

Epoch 37 training complete
Cost on training data: 0.07027444492
Accuracy on training data: 48403 / 50000
Cost on evaluation data: 0.20611673117
Accuracy on evaluation data: 9630 / 10000

Epoch 38 training complete
Cost on training data: 0.0683722959656
Accuracy on training data: 48525 / 50000
Cost on evaluation data: 0.20429278246
Accuracy on evaluation data: 9667 / 10000

Epoch 39 training complete
Cost on training data: 0.0672118882159
Accuracy on training data: 48589 / 50000
Cost on evaluation data: 0.203621631122
Accuracy on evaluation data: 9684 / 10000

Epoch 0 training complete
Cost on training data: 5.1745183444
Accuracy on training data: 19876 / 50000
Cost on evaluation data: 24.4962297911
Accuracy on evaluation data: 4013 / 10000

Epoch 1 training complete
Cost on training data: 3.2241509914
Accuracy on training data: 23347 / 50000
Cost on evaluation data: 14.9456529872
Accuracy on evaluation data: 4676 / 10000

Epoch 2 training complete
Cost on training data: 2.02046348238
Accuracy on training data: 28178 / 50000
Cost on evaluation data: 9.14068778368
Accuracy on evaluation data: 5604 / 10000

Epoch 3 training complete
Cost on training data: 1.19130022056
Accuracy on training data: 42306 / 50000
Cost on evaluation data: 5.52689870669
Accuracy on evaluation data: 8476 / 10000

Epoch 4 training complete
Cost on training data: 0.729916111587
Accuracy on training data: 46917 / 50000
Cost on evaluation data: 3.39362372267
Accuracy on evaluation data: 9387 / 10000

Epoch 5 training complete
Cost on training data: 0.465595575924
Accuracy on training data: 47512 / 50000
Cost on evaluation data: 2.12536149515
Accuracy on evaluation data: 9502 / 10000

Epoch 6 training complete
Cost on training data: 0.311912589875
Accuracy on training data: 47653 / 50000
Cost on evaluation data: 1.36524429501
Accuracy on evaluation data: 9518 / 10000

Epoch 7 training complete
Cost on training data: 0.215396081156
Accuracy on training data: 47934 / 50000
Cost on evaluation data: 0.905468450821
Accuracy on evaluation data: 9576 / 10000

Epoch 8 training complete
Cost on training data: 0.161355518182
Accuracy on training data: 47801 / 50000
Cost on evaluation data: 0.632160392799
Accuracy on evaluation data: 9558 / 10000

Epoch 9 training complete
Cost on training data: 0.125276845264
Accuracy on training data: 48135 / 50000
Cost on evaluation data: 0.464389298513
Accuracy on evaluation data: 9605 / 10000

Epoch 10 training complete
Cost on training data: 0.10533316191
Accuracy on training data: 48110 / 50000
Cost on evaluation data: 0.365618088807
Accuracy on evaluation data: 9599 / 10000

Epoch 11 training complete
Cost on training data: 0.0917651526037
Accuracy on training data: 48215 / 50000
Cost on evaluation data: 0.303947391525
Accuracy on evaluation data: 9617 / 10000

Epoch 12 training complete
Cost on training data: 0.0852197971391
Accuracy on training data: 48280 / 50000
Cost on evaluation data: 0.26840490972
Accuracy on evaluation data: 9640 / 10000

Epoch 13 training complete
Cost on training data: 0.077714863899
Accuracy on training data: 48343 / 50000
Cost on evaluation data: 0.243547312752
Accuracy on evaluation data: 9657 / 10000

Epoch 14 training complete
Cost on training data: 0.077986384013
Accuracy on training data: 48222 / 50000
Cost on evaluation data: 0.233206511761
Accuracy on evaluation data: 9623 / 10000

Epoch 15 training complete
Cost on training data: 0.0765077739791
Accuracy on training data: 48238 / 50000
Cost on evaluation data: 0.225538011033
Accuracy on evaluation data: 9623 / 10000

Epoch 16 training complete
Cost on training data: 0.0720232096791
Accuracy on training data: 48355 / 50000
Cost on evaluation data: 0.217142007655
Accuracy on evaluation data: 9655 / 10000

Epoch 17 training complete
Cost on training data: 0.0724238659631
Accuracy on training data: 48364 / 50000
Cost on evaluation data: 0.215584048724
Accuracy on evaluation data: 9640 / 10000

Epoch 18 training complete
Cost on training data: 0.0727829469583
Accuracy on training data: 48294 / 50000
Cost on evaluation data: 0.214672021961
Accuracy on evaluation data: 9626 / 10000

Epoch 19 training complete
Cost on training data: 0.0721098829158
Accuracy on training data: 48333 / 50000
Cost on evaluation data: 0.212864200707
Accuracy on evaluation data: 9632 / 10000

Epoch 20 training complete
Cost on training data: 0.0718628830548
Accuracy on training data: 48373 / 50000
Cost on evaluation data: 0.21176831328
Accuracy on evaluation data: 9649 / 10000

Epoch 21 training complete
Cost on training data: 0.0697850690867
Accuracy on training data: 48468 / 50000
Cost on evaluation data: 0.209565195629
Accuracy on evaluation data: 9658 / 10000

Epoch 22 training complete
Cost on training data: 0.0693635253792
Accuracy on training data: 48462 / 50000
Cost on evaluation data: 0.208820237969
Accuracy on evaluation data: 9653 / 10000

Epoch 23 training complete
Cost on training data: 0.0710427147179
Accuracy on training data: 48375 / 50000
Cost on evaluation data: 0.209834853814
Accuracy on evaluation data: 9642 / 10000

Epoch 24 training complete
Cost on training data: 0.0699777310838
Accuracy on training data: 48485 / 50000
Cost on evaluation data: 0.209673306335
Accuracy on evaluation data: 9679 / 10000

Epoch 25 training complete
Cost on training data: 0.0710727337154
Accuracy on training data: 48402 / 50000
Cost on evaluation data: 0.210611216577
Accuracy on evaluation data: 9644 / 10000

Epoch 26 training complete
Cost on training data: 0.0710299440324
Accuracy on training data: 48471 / 50000
Cost on evaluation data: 0.210735582866
Accuracy on evaluation data: 9650 / 10000

Epoch 27 training complete
Cost on training data: 0.0689616618556
Accuracy on training data: 48550 / 50000
Cost on evaluation data: 0.2083193359
Accuracy on evaluation data: 9675 / 10000

Epoch 28 training complete
Cost on training data: 0.0689251115695
Accuracy on training data: 48483 / 50000
Cost on evaluation data: 0.208043187883
Accuracy on evaluation data: 9659 / 10000

Epoch 29 training complete
Cost on training data: 0.0687200613751
Accuracy on training data: 48558 / 50000
Cost on evaluation data: 0.208025431277
Accuracy on evaluation data: 9662 / 10000

Epoch 30 training complete
Cost on training data: 0.068028539343
Accuracy on training data: 48539 / 50000
Cost on evaluation data: 0.20735543297
Accuracy on evaluation data: 9668 / 10000

Epoch 31 training complete
Cost on training data: 0.0709934526561
Accuracy on training data: 48414 / 50000
Cost on evaluation data: 0.210130297509
Accuracy on evaluation data: 9645 / 10000

Epoch 32 training complete
Cost on training data: 0.0688941788057
Accuracy on training data: 48522 / 50000
Cost on evaluation data: 0.20823681365
Accuracy on evaluation data: 9661 / 10000

Epoch 33 training complete
Cost on training data: 0.0719136066014
Accuracy on training data: 48408 / 50000
Cost on evaluation data: 0.210221671956
Accuracy on evaluation data: 9649 / 10000

Epoch 34 training complete
Cost on training data: 0.071661726771
Accuracy on training data: 48401 / 50000
Cost on evaluation data: 0.210944931909
Accuracy on evaluation data: 9648 / 10000

Epoch 35 training complete
Cost on training data: 0.0684698107141
Accuracy on training data: 48589 / 50000
Cost on evaluation data: 0.207418832033
Accuracy on evaluation data: 9680 / 10000

Epoch 36 training complete
Cost on training data: 0.0676313829221
Accuracy on training data: 48591 / 50000
Cost on evaluation data: 0.207123482779
Accuracy on evaluation data: 9683 / 10000

Epoch 37 training complete
Cost on training data: 0.0700040520312
Accuracy on training data: 48486 / 50000
Cost on evaluation data: 0.209088529232
Accuracy on evaluation data: 9647 / 10000

Epoch 38 training complete
Cost on training data: 0.0697589467638
Accuracy on training data: 48575 / 50000
Cost on evaluation data: 0.209088396759
Accuracy on evaluation data: 9675 / 10000

Epoch 39 training complete
Cost on training data: 0.0686697889581
Accuracy on training data: 48550 / 50000
Cost on evaluation data: 0.207777292798
Accuracy on evaluation data: 9668 / 10000


In [2]:
def plot_overlay_train(data, num_epochs, xmin,
                 training_set_size,M_list):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    for i,M in enumerate(M_list):
     test_accuracy = data[i][1] 
     training_accuracy = data[i][3]
     ax.plot(np.arange(xmin, num_epochs), 
            [accuracy/100.0 for accuracy in test_accuracy], 
            
            label="Accuracy on the test data, M="+str(M))
     
     ax.grid(True)
     ax.set_xlim([xmin, num_epochs])
     ax.set_xlabel('Epoch')
     ax.set_ylim([90, 100])
    # Shrink current axis by 20%
    box = ax.get_position()
    ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])

# Put a legend to the right of the current axis
    ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
    #plt.legend(loc="lower right")
    plt.show()

def plot_overlay_test(data, num_epochs, xmin,
                 training_set_size,M_list):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    for i,M in enumerate(M_list):
     test_accuracy = data[i][1] 
     training_accuracy = data[i][3]
     ax.plot(np.arange(xmin, num_epochs), 
            [accuracy*100.0/training_set_size 
             for accuracy in training_accuracy], 
            
            label="Accuracy on the training data, M="+str(M))
     ax.grid(True)
     ax.set_xlim([xmin, num_epochs])
     ax.set_xlabel('Epoch')
     ax.set_ylim([90, 100])
    # Shrink current axis by 20%
    box = ax.get_position()
    ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])

# Put a legend to the right of the current axis
    ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
    #plt.legend(loc="lower right")
    plt.show()

In [15]:
import json
M_list=[10,30,50,100,200]
net_list=[]
data_list=[]

for i,M in enumerate(M_list):
    with open('results_totaux'+str(M)) as json_data:
        d = json.loads(json_data.read())
        data_list.append(d[1:])
        
#%print data_list
plot_overlay_test(data_list, 40, 0,50000,M_list) 

plot_overlay_train(data_list, 40, 0,50000,M_list)


What can we deduce from this results ?

As we can see, increasing the number of hidden neurons $M$ lead to a larger accuracy of the system. On the first plot we notice that increasing $M$ first leads to an increase of learning on the training datas, but after $M=100$ the accuracy seems to be similar between the networks.

On the test data, globally we observe the same behavior but the results are similar as soon as we reach $M=50$. We can obtain better accuracy by increasing the size of the hidden layer, but at some point that's the way to go to 100 % accuracy.

Improvment on the results with Deep Learning

We are going to improve with the network we have presented. To improve more the results we should use the convolutional network architecture, in order to recognize features but this is behond tis homework.

One way to improve the categorisation of digits without changing the architecture of the neural network is to work with another cost function. We will try to see what happend with the cost Entropy function. this one is defined by : $$ C = - \frac{1}{n} \sum_x [y \ln y + (1-y) \ln (1-y)]$$ The cross entropy comes from the field of information theory, and is sometimes called the binary entropy. Just to introduce briefly this quantity, we can say that this quantity measure how much we are "surprised" when learning the true value of $y$. This quantity is equal to the Kullback_Leibner divergence up to some constant, which is appropriate to the course while highlighting the link between statistical physics and machine learning.

Now we just have to add a class for this cost function and we can use the same program to see the difference with the quadratic cost function.


In [16]:
class CrossEntropyCost(object):

    @staticmethod
    def fn(a, y):
        """Return the cost associated with an output ``a`` and desired output
        ``y``.  Note that np.nan_to_num is used to ensure numerical
        stability.  In particular, if both ``a`` and ``y`` have a 1.0
        in the same slot, then the expression (1-y)*np.log(1-a)
        returns nan.  The np.nan_to_num ensures that that is converted
        to the correct value (0.0).

        """
        return np.sum(np.nan_to_num(-y*np.log(a)-(1-y)*np.log(1-a)))

    @staticmethod
    def delta(z, a, y):
        """Return the error delta from the output layer.  Note that the
        parameter ``z`` is not used by the method.  It is included in
        the method's parameters in order to make the interface
        consistent with the delta method for other cost classes.

        """
        return (a-y)

In [27]:
net = Network([784, 100, 10], cost=CrossEntropyCost)
net.large_weight_initializer()
test_cost, test_accuracy, training_cost, training_accuracy=net.SGD(training_data, 60, 10, 0.1, lmbda=5.0,evaluation_data=validation_data,monitor_evaluation_accuracy=True)


plot_test_accuracy(test_accuracy,60, 0)


As we can see we have better results. Indeed we are able to reach an accuracy of $98$ %. We highlight the fact that for machine learning problem, the choice of the cost function is important.

If we work on the other parameter : architecture, transfer function, weight initialization .... we could still improve the classification. We are going briefly to introduce the results we can obtained with convolutional nets. For this we will use a python implementation : theano.

For optimization we will still use backpropagation and stochastic gradient descent as in the previous part.


In [99]:
import network3
from network3 import Network
from network3 import ConvPoolLayer, FullyConnectedLayer, SoftmaxLayer

training_data, validation_data, test_data = network3.load_data_shared()
mini_batch_size = 10
net = Network([FullyConnectedLayer(n_in=784, n_out=100),SoftmaxLayer(n_in=100, n_out=10)], mini_batch_size)
net.SGD(training_data, 60, mini_batch_size, 0.1, validation_data, test_data)


---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-99-0ee1fadadc40> in <module>()
      3 from network3 import ConvPoolLayer, FullyConnectedLayer, SoftmaxLayer
      4 
----> 5 training_data, validation_data, test_data = network3.load_data_shared()
      6 mini_batch_size = 10
      7 net = Network([FullyConnectedLayer(n_in=784, n_out=100),SoftmaxLayer(n_in=100, n_out=10)], mini_batch_size)

/media/kevin/Boulot1/Dropbox/Notebook_Jupyter/Algorithm_Physic/network3.py in load_data_shared(filename)
     63 #### Load the MNIST data
     64 def load_data_shared(filename="mnist.pkl.gz"):
---> 65     f = gzip.open(filename, 'rb')
     66     training_data, validation_data, test_data = cPickle.load(f)
     67     f.close()

/usr/lib/python2.7/gzip.pyc in open(filename, mode, compresslevel)
     32 
     33     """
---> 34     return GzipFile(filename, mode, compresslevel)
     35 
     36 class GzipFile(io.BufferedIOBase):

/usr/lib/python2.7/gzip.pyc in __init__(self, filename, mode, compresslevel, fileobj, mtime)
     92             mode += 'b'
     93         if fileobj is None:
---> 94             fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
     95         if filename is None:
     96             # Issue #13781: os.fdopen() creates a fileobj with a bogus name

IOError: [Errno 2] No such file or directory: '../data/mnist.pkl.gz'

In [97]:



---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-97-0ee1fadadc40> in <module>()
      3 from network3 import ConvPoolLayer, FullyConnectedLayer, SoftmaxLayer
      4 
----> 5 training_data, validation_data, test_data = network3.load_data_shared()
      6 mini_batch_size = 10
      7 net = Network([FullyConnectedLayer(n_in=784, n_out=100),SoftmaxLayer(n_in=100, n_out=10)], mini_batch_size)

/media/kevin/Boulot1/Dropbox/Notebook_Jupyter/Algorithm_Physic/network3.py in load_data_shared(filename)
     63 #### Load the MNIST data
     64 def load_data_shared(filename="mnist.pkl.gz"):
---> 65     f = gzip.open(filename, 'rb')
     66     training_data, validation_data, test_data = cPickle.load(f)
     67     f.close()

/usr/lib/python2.7/gzip.pyc in open(filename, mode, compresslevel)
     32 
     33     """
---> 34     return GzipFile(filename, mode, compresslevel)
     35 
     36 class GzipFile(io.BufferedIOBase):

/usr/lib/python2.7/gzip.pyc in __init__(self, filename, mode, compresslevel, fileobj, mtime)
     92             mode += 'b'
     93         if fileobj is None:
---> 94             fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
     95         if filename is None:
     96             # Issue #13781: os.fdopen() creates a fileobj with a bogus name

IOError: [Errno 2] No such file or directory: '../data/mnist.pkl.gz'