My feedforward neural network imeplmentation + cost function described in Coursera Stanford Machine Learning course for hand-written digits recognition

You should start ipython notebook server with "--pylab inline" flag.


In [1]:
import sys
sys.path.append('../src')
import csv

In [2]:
from numpy import loadtxt
from numpy.random import uniform
from sklearn.decomposition import PCA
from sklearn.cross_validation import train_test_split
from matplotlib.cm import binary_r
from FeedforwardNeuNet import sigmoid, NnLayer, FeedforwardNeuNet
from CostFunc import courseraML_CostFunc, courseraML_CostFuncGrad

Loading and preprocessing data:


In [3]:
train_targetsAndInputs=loadtxt('../demoDataSet/kaggle_digit_train.csv',delimiter=',',skiprows=1)
test_inputsOrig=loadtxt('../demoDataSet/kaggle_digit_test.csv',delimiter=',',skiprows=1)
test_rf_benchmark=loadtxt('../demoDataSet/kaggle_digit_rf_benchmark.csv',delimiter=',',skiprows=1)

y = train_targetsAndInputs[:,0][:, None] # make each target a row in 2D array
identityArr = identity(10)
targets = select([y == 0, y == 1, y == 2, y == 3, y == 4, y == 5, y == 6, y == 7, y == 8, y == 9], [identityArr[0], identityArr[1], identityArr[2], identityArr[3], identityArr[4], identityArr[5], identityArr[6], identityArr[7], identityArr[8], identityArr[9]])
traingSetNormalized=0
PCAonTrainSet=0
inputs=train_targetsAndInputs[:,1:]
pcaReducedDimTo=inputs[0].size

In [4]:
imshow(reshape(train_targetsAndInputs[-1,1:], (28, 28)), binary_r)
print('target: {0}'.format(train_targetsAndInputs[-1,0]))


target: 9.0

Normalizing training set and test set:


In [5]:
traingSetNormalized=1
inputs=(inputs-mean(inputs))/std(inputs) #normalizing inputs
test_inputs=(test_inputsOrig-mean(test_inputsOrig))/std(test_inputsOrig) # normalize test_inputsOrig

Do NOT use PCA code here to reduce input dimension cause it messed up the data, and resulted in slow nn training:


In [ ]:
# If I use PCA, then it takes forever to train my nn
PCAonTrainSet=1
pcaReducedDimTo=400
pca = PCA(pcaReducedDimTo)
pca.fit(inputs)
print(pca.explained_variance_ratio_)
pca_inputs=pca.transform(inputs)
pca.fit(test_inputs)
pca_test_inputs=pca.transform(test_inputs)

Create and train my feedforward neural network:

Split training data into training set and CV set:


In [6]:
nnInputs=pca_inputs if PCAonTrainSet else inputs
nnInputs_train, nnInputs_cv, targets_train, targets_cv = train_test_split(nnInputs, targets, test_size=0.3, random_state=0)

Model selection/Setting hyperparameters:

(1) number of layers:

I initialize weights using the "effective strategy for initializing initial weights" (described in Coursera Stanford Machine Learning course, homework assignment 4, ex4.pdf, p7)


In [7]:
errRate_train, errRate_cv=[],[]

In [138]:
maxIter=20
start, end=3,6 # inclusive
for numOfLyExcOutputLy in xrange(start, end+1):
    layersExOutputLy=[]
    dimReducedTo=pcaReducedDimTo
    listOfNumOfUnits=range(10, dimReducedTo, dimReducedTo/numOfLyExcOutputLy)
    for n in reversed(listOfNumOfUnits):
        layersExOutputLy.append(NnLayer(sigmoid, dimReducedTo, 1, n)) # input layer and each hidden layer. No need to creat output layer
        initRange=6**0.5/(dimReducedTo+1+n+1)**0.5
        layersExOutputLy[-1].updateForwardWeight(uniform(-initRange, initRange, (n,dimReducedTo+1)))
        dimReducedTo=n
    nn = FeedforwardNeuNet(layersExOutputLy, 1, 0.01, 0)
    nn.train(nnInputs_train, targets_train, courseraML_CostFunc, courseraML_CostFuncGrad, maxIter)
    predictionsOnTrainSet=argmax(asarray(nn.forwardPropogateAllInput(nnInputs_train)),1)
    numPredictErr=count_nonzero(logical_not(argmax(targets_train,1)==predictionsOnTrainSet))
    errRate_train.append(numPredictErr*1.0/len(predictionsOnTrainSet))
    predictionsOnCVSet=argmax(asarray(nn.forwardPropogateAllInput(nnInputs_cv)),1)
    numPredictErr=count_nonzero(logical_not(argmax(targets_cv,1)==predictionsOnCVSet))
    errRate_cv.append(numPredictErr*1.0/len(predictionsOnCVSet))


Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.982238
         Iterations: 20
         Function evaluations: 35
         Gradient evaluations: 35
Warning: Maximum number of iterations has been exceeded.
         Current function value: 1.588392
         Iterations: 20
         Function evaluations: 39
         Gradient evaluations: 39
Warning: Maximum number of iterations has been exceeded.
         Current function value: 2.105373
         Iterations: 20
         Function evaluations: 39
         Gradient evaluations: 39
Warning: Maximum number of iterations has been exceeded.
         Current function value: 3.278530
         Iterations: 20
         Function evaluations: 33
         Gradient evaluations: 33

In [139]:
fig, ax = plt.subplots()
ax.plot(range(start,end+1), errRate_cv, lw=2, label = 'cross-validation error')
ax.plot(range(start,end+1), errRate_train, lw=2, label = 'training error')
ax.legend(loc=0)
ax.set_xlabel('num of layers')
ax.set_ylabel('prediction error rate')


Out[139]:
<matplotlib.text.Text at 0xd9abdd0>

It seems like using more layers is not necessarilly better. So I choose a 3-layer setup.

(2) number of units in 1st hidden layer


In [20]:
errRate_train, errRate_cv=[],[]

In [21]:
maxIter=20
start, end=50, 500 # inclusive
for fstHidUnt in xrange(start,end+1,100):
    layersExOutputLy=[]    
    listOfNumOfUnits=[10, 40, fstHidUnt]
    dimReducedTo=pcaReducedDimTo
    for n in reversed(listOfNumOfUnits):
        layersExOutputLy.append(NnLayer(sigmoid, dimReducedTo, 1, n)) # input layer and each hidden layer. No need to creat output layer
        initRange=6**0.5/(dimReducedTo+1+n+1)**0.5
        layersExOutputLy[-1].updateForwardWeight(uniform(-initRange, initRange, (n,dimReducedTo+1)))
        dimReducedTo=n
    nn = FeedforwardNeuNet(layersExOutputLy, 1, 0.01, 0)
    nn.train(nnInputs_train, targets_train, courseraML_CostFunc, courseraML_CostFuncGrad, maxIter)
    predictionsOnTrainSet=argmax(asarray(nn.forwardPropogateAllInput(nnInputs_train)),1)
    numPredictErr=count_nonzero(logical_not(argmax(targets_train,1)==predictionsOnTrainSet))
    errRate_train.append(numPredictErr*1.0/len(predictionsOnTrainSet))
    predictionsOnCVSet=argmax(asarray(nn.forwardPropogateAllInput(nnInputs_cv)),1)
    numPredictErr=count_nonzero(logical_not(argmax(targets_cv,1)==predictionsOnCVSet))
    errRate_cv.append(numPredictErr*1.0/len(predictionsOnCVSet))


Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.964153
         Iterations: 20
         Function evaluations: 39
         Gradient evaluations: 39
Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.843281
         Iterations: 20
         Function evaluations: 37
         Gradient evaluations: 37
Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.906120
         Iterations: 20
         Function evaluations: 36
         Gradient evaluations: 36
Warning: Maximum number of iterations has been exceeded.
         Current function value: 1.365604
         Iterations: 20
         Function evaluations: 34
         Gradient evaluations: 34
Warning: Maximum number of iterations has been exceeded.
         Current function value: 1.853325
         Iterations: 20
         Function evaluations: 32
         Gradient evaluations: 32

In [25]:
fig, ax = plt.subplots()
ax.plot(range(start,end+1,100), errRate_cv, lw=2, label = 'cross-validation error')
ax.plot(range(start,end+1,100), errRate_train, lw=2, label = 'training error')
ax.legend(loc=0)
ax.set_xlabel('num of units in 1st hidden layer')
ax.set_ylabel('prediction error rate')


Out[25]:
<matplotlib.text.Text at 0x4d55c850>

In [24]:
(errRate_cv, errRate_train)


Out[24]:
([0.15126984126984128,
  0.11746031746031746,
  0.11698412698412698,
  0.1984126984126984,
  0.40190476190476193],
 [0.1501360544217687,
  0.11918367346938775,
  0.12360544217687075,
  0.20517006802721088,
  0.40187074829931974])

Based on the err rate of CV set, I set the number of hidden units in 1st hidden layer to: 200

(3) number of units in 2nd hidden layer


In [30]:
errRate_train, errRate_cv=[],[]

In [31]:
maxIter=20
start, end=50, 150 # inclusive
for fstHidUnt in xrange(start, end+1, 20):
    layersExOutputLy=[]    
    listOfNumOfUnits=[10, fstHidUnt, 200]
    dimReducedTo=pcaReducedDimTo
    for n in reversed(listOfNumOfUnits):
        layersExOutputLy.append(NnLayer(sigmoid, dimReducedTo, 1, n)) # input layer and each hidden layer. No need to creat output layer
        initRange=6**0.5/(dimReducedTo+1+n+1)**0.5
        layersExOutputLy[-1].updateForwardWeight(uniform(-initRange, initRange, (n,dimReducedTo+1)))
        dimReducedTo=n
    nn = FeedforwardNeuNet(layersExOutputLy, 1, 0.01, 0)
    nn.train(nnInputs_train, targets_train, courseraML_CostFunc, courseraML_CostFuncGrad, maxIter)
    predictionsOnTrainSet=argmax(asarray(nn.forwardPropogateAllInput(nnInputs_train)),1)
    numPredictErr=count_nonzero(logical_not(argmax(targets_train,1)==predictionsOnTrainSet))
    errRate_train.append(numPredictErr*1.0/len(predictionsOnTrainSet))
    predictionsOnCVSet=argmax(asarray(nn.forwardPropogateAllInput(nnInputs_cv)),1)
    numPredictErr=count_nonzero(logical_not(argmax(targets_cv,1)==predictionsOnCVSet))
    errRate_cv.append(numPredictErr*1.0/len(predictionsOnCVSet))


Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.986339
         Iterations: 20
         Function evaluations: 35
         Gradient evaluations: 35
Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.653009
         Iterations: 20
         Function evaluations: 34
         Gradient evaluations: 34
Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.776063
         Iterations: 20
         Function evaluations: 33
         Gradient evaluations: 33
Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.819521
         Iterations: 20
         Function evaluations: 33
         Gradient evaluations: 33
Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.802499
         Iterations: 20
         Function evaluations: 32
         Gradient evaluations: 32
Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.905642
         Iterations: 20
         Function evaluations: 30
         Gradient evaluations: 30

In [35]:
fig, ax = plt.subplots()
ax.plot(range(start,end+1,20), errRate_cv, lw=2, label = 'cross-validation error')
ax.plot(range(start,end+1,20), errRate_train, lw=2, label = 'training error')
ax.legend(loc=0)
ax.set_xlabel('num of units in 2nd hidden layer')
ax.set_ylabel('prediction error rate')


Out[35]:
<matplotlib.text.Text at 0x6a0ba10>

In [33]:
(errRate_cv, errRate_train)


Out[33]:
([0.1411111111111111,
  0.09579365079365079,
  0.10896825396825396,
  0.11428571428571428,
  0.11761904761904762,
  0.12785714285714286],
 [0.14680272108843537,
  0.0977891156462585,
  0.11214285714285714,
  0.12074829931972789,
  0.12445578231292517,
  0.1311904761904762])

Based on the err rate of CV set, I set the number of hidden units in 2nd hidden layer to: 70

Build neural network based on the possibly optimal hyperparameters observed:


In [5]:
maxIter=150
layersExOutputLy=[]
listOfNumOfUnits=[10, 70, 200]
dimReducedTo=pcaReducedDimTo
for n in reversed(listOfNumOfUnits):
    # Only input layer and each hidden layer are created. No need to creat output layer cause in my 
    # implementation, each layer will call activation function on weighted inputs (and also its weighted 
    # bias) and output the result to next layer. In other words, the nn[:]( or nn.outputs, when given 
    # multiple inputs) will serve as the output layer, and all it does is storing the output values of 
    # the last hidden layer.
    layersExOutputLy.append(NnLayer(sigmoid, dimReducedTo, 1, n))
    initRange=6**0.5/(dimReducedTo+1+n+1)**0.5
    layersExOutputLy[-1].updateForwardWeight(uniform(-initRange, initRange, (n,dimReducedTo+1)))
    dimReducedTo=n
nn = FeedforwardNeuNet(layersExOutputLy, 1, 0.01, 0)

In [37]:
nn.train(nnInputs_train, targets_train, courseraML_CostFunc, courseraML_CostFuncGrad, maxIter)


Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.049476
         Iterations: 150
         Function evaluations: 916
         Gradient evaluations: 895

In [38]:
predictionsOnTrainSet=argmax(asarray(nn.forwardPropogateAllInput(nnInputs_train)),1)
numPredictErr=count_nonzero(logical_not(argmax(targets_train,1)==predictionsOnTrainSet))
errRate_train=numPredictErr*1.0/len(predictionsOnTrainSet)
predictionsOnCVSet=argmax(asarray(nn.forwardPropogateAllInput(nnInputs_cv)),1)
numPredictErr=count_nonzero(logical_not(argmax(targets_cv,1)==predictionsOnCVSet))
errRate_cv=numPredictErr*1.0/len(predictionsOnCVSet)
(errRate_train, errRate_cv)


Out[38]:
(3.401360544217687e-05, 0.026507936507936508)

Special note about the importance of model selection using cross validation:

Before using cross validation on model selection, I trained my nn using casually-picked hyperparameters:

  • 3 layers (input layer+ 2 hidden layers + output layer that does nothing but store the output values of previous hidden layer)
  • dimensions of each layer (from input to output): 785 (1 bias unit included), 41 (1 bias unit included), 21 (1 bias unit included), 10

I used the follow code snippet to visualize the impact when gradually increase the max iteration of my nn.train() function:


In [ ]:
errRate_train,errRate_cv=[],[]
start, end=80,150 # inclusive, this segment of code takes pretty long to finish
for maxI in range(start,end+1,10):
    initRange=6**0.5/(dimReducedTo+1+28+1)**0.5
    nn.layersExOutputLy[0].updateForwardWeight(uniform(-initRange, initRange, (40,dimReducedTo+1))) # treat a layer as a column vector
    initRange=6**0.5/(28+1+20+1)**0.5
    nn.layersExOutputLy[1].updateForwardWeight(uniform(-initRange, initRange, (20,41)))
    initRange=6**0.5/(20+1+10)**0.5
    nn.layersExOutputLy[2].updateForwardWeight(uniform(-initRange, initRange, (10,21)))
    nn.train(nnInputs_train, targets_train, courseraML_CostFunc, courseraML_CostFuncGrad, maxI)
    predictionsOnTrainSet=argmax(asarray(nn.forwardPropogateAllInput(nnInputs_train)),1)
    numPredictErr=count_nonzero(logical_not(argmax(targets_train,1)==predictionsOnTrainSet))
    errRate_train.append(numPredictErr*1.0/len(predictionsOnTrainSet))
    predictionsOnCVSet=argmax(asarray(nn.forwardPropogateAllInput(nnInputs_cv)),1)
    numPredictErr=count_nonzero(logical_not(argmax(targets_cv,1)==predictionsOnCVSet))
    errRate_cv.append(numPredictErr*1.0/len(predictionsOnCVSet))

In [80]:
fig, ax = plt.subplots()
ax.plot(range(start,end+1,10), errRate_cv, lw=2, label = 'cross-validation error')
ax.plot(range(start,end+1,10), errRate_train, lw=2, label = 'training error')
ax.legend(loc=0)
ax.set_xlabel('max iteration of train()')
ax.set_ylabel('prediction error rate')


Out[80]:
<matplotlib.text.Text at 0x3070e750>

In [81]:
(errRate_train,errRate_cv)


Out[81]:
([0.030068027210884352,
  0.02554421768707483,
  0.028673469387755102,
  0.01554421768707483,
  0.017551020408163264,
  0.00901360544217687,
  0.01108843537414966,
  0.00925170068027211],
 [0.06134920634920635,
  0.057301587301587305,
  0.05619047619047619,
  0.05150793650793651,
  0.05380952380952381,
  0.04880952380952381,
  0.05373015873015873,
  0.053253968253968255])

The value of cost function given optimized weights when max iteration is:

  • 80: 0.259619, 90: 0.239125,
  • 100: 0.247281, 110: 0.173178,
  • 120: 0.186852, 130: 0.131729,
  • 140: 0.145243, 150: 0.144389

It's clear that cost function converges at ~130. And this preliminary model suffers from higher error rates on both training set and cv set. In fact, the preliminary model has error rates that are:

  • ~30000% of the error rate of the cross-validated model on training set
  • ~200% of the error rate of the cross-validated model on cv set

Conclusion: Cross validation is truly an amazing tool for model selection!

Save optimized weights and corresponding configurations:


In [28]:
with open('Kaggle_digits_recognition_configuration.csv', 'ab') as csvfile:
    csvWriter=csv.writer(csvfile)
    csvWriter.writerow(['numbOfLayers','numOfUnitsExcBias','fming_cg_maxIter','traingSetNormalized','PCAonTrainSet','reducedDimAfterPCA'])

In [40]:
whichConfig=4
dimReducedTo=len(nn.layersExOutputLy[0])-1
for i,ly in enumerate(nn.layersExOutputLy):
    save('./Kaggle_digits_recognition_optWeight{0}_{1}'.format(whichConfig,i), nn.layersExOutputLy[i]._NnLayer__forwardWeight)
with open('Kaggle_digits_recognition_configuration.csv', 'ab') as csvfile:
    csvWriter = csv.writer(csvfile)
    data=[len(nn.layersExOutputLy)]
    for ly in nn.layersExOutputLy:
        data.append(len(ly)-1)
    data.append(maxIter)
    data.append(traingSetNormalized)
    data.append(PCAonTrainSet)
    data.append(dimReducedTo)
    csvWriter.writerow(data)

Human perception v.s. my neural network's predicion:


In [53]:
nn_testInputs=pca_test_inputs if PCAonTrainSet else test_inputs
whichExample=109
imshow(reshape(nn_testInputs[whichExample], (28, 28)), binary_r)
print('My neural network\'s prediction: {0}'.format(argmax(nn.forwardPropogateOneInput(nn_testInputs[whichExample]))))
print('Testing example:')


My neural network's prediction: 8
Testing example:

Make prediction on test set:

each row is: image index, prediction

In [43]:
indexAndPredictions=array((arange(1,len(nn_testInputs)+1),argmax(asarray(nn.forwardPropogateAllInput(nn_testInputs)),1))).T
number of different predictions between mine and benchmark produced by randon forest created by Kaggle:

In [49]:
count_nonzero(logical_not(int64(test_rf_benchmark)[:,1]==indexAndPredictions[:,1]))


Out[49]:
777
image index of different predictions between mine and randon forest benchmark:

In [45]:
logical_not(int64(test_rf_benchmark)[:,1]==indexAndPredictions[:,1]).nonzero()[0] # image index of difference


Out[45]:
array([   81,   101,   109,   138,   139,   157,   172,   214,   246,
         249,   275,   287,   305,   383,   428,   494,   509,   538,
         647,   651,   653,   668,   728,   912,   927,  1036,  1083,
        1096,  1122,  1131,  1183,  1244,  1249,  1271,  1383,  1392,
        1402,  1456,  1487,  1494,  1503,  1566,  1588,  1686,  1888,
        1900,  1907,  1953,  1958,  2007,  2085,  2087,  2112,  2120,
        2126,  2145,  2208,  2367,  2369,  2386,  2407,  2409,  2416,
        2445,  2485,  2499,  2518,  2548,  2594,  2600,  2637,  2663,
        2667,  2727,  2828,  2832,  2835,  2845,  2870,  2885,  2908,
        2963,  3080,  3081,  3097,  3126,  3174,  3220,  3231,  3254,
        3255,  3256,  3279,  3288,  3294,  3327,  3361,  3367,  3373,
        3378,  3391,  3424,  3448,  3463,  3490,  3529,  3622,  3627,
        3721,  3723,  3735,  3774,  3794,  3803,  3857,  3865,  3934,
        4080,  4167,  4187,  4251,  4263,  4279,  4374,  4390,  4448,
        4474,  4480,  4509,  4605,  4610,  4723,  4748,  4765,  4774,
        4780,  4806,  4824,  4839,  4878,  4927,  4993,  4994,  5038,
        5214,  5215,  5222,  5272,  5276,  5350,  5420,  5447,  5457,
        5470,  5513,  5520,  5557,  5561,  5571,  5602,  5618,  5678,
        5684,  5715,  5777,  5830,  5901,  5945,  6000,  6040,  6070,
        6148,  6155,  6169,  6170,  6212,  6215,  6244,  6284,  6327,
        6400,  6403,  6448,  6458,  6474,  6550,  6577,  6677,  6687,
        6713,  6738,  6789,  6798,  6801,  6904,  6906,  7026,  7128,
        7153,  7178,  7179,  7242,  7256,  7264,  7277,  7339,  7341,
        7370,  7372,  7437,  7511,  7556,  7586,  7608,  7819,  7825,
        7837,  7842,  7882,  7911,  7978,  8084,  8119,  8197,  8247,
        8255,  8316,  8324,  8337,  8355,  8369,  8481,  8488,  8514,
        8521,  8578,  8605,  8710,  8736,  8866,  8899,  8926,  8950,
        8956,  9158,  9275,  9277,  9335,  9368,  9382,  9393,  9473,
        9514,  9518,  9577,  9597,  9645,  9682,  9730,  9737,  9814,
        9881,  9886,  9903,  9910,  9928,  9981, 10040, 10067, 10128,
       10130, 10252, 10301, 10320, 10338, 10362, 10380, 10396, 10398,
       10423, 10429, 10480, 10500, 10505, 10590, 10644, 10690, 10775,
       10861, 10870, 10872, 11013, 11062, 11106, 11118, 11123, 11130,
       11139, 11146, 11166, 11198, 11199, 11250, 11266, 11287, 11299,
       11361, 11380, 11409, 11463, 11557, 11587, 11627, 11649, 11654,
       11670, 11695, 11743, 11746, 11752, 11756, 11759, 11762, 11827,
       11861, 11862, 11976, 12074, 12086, 12090, 12092, 12108, 12133,
       12165, 12208, 12251, 12297, 12309, 12355, 12419, 12432, 12438,
       12465, 12467, 12507, 12527, 12557, 12620, 12635, 12655, 12694,
       12725, 12754, 12838, 12839, 12889, 12896, 13074, 13108, 13115,
       13157, 13242, 13268, 13315, 13320, 13331, 13340, 13363, 13380,
       13468, 13522, 13559, 13576, 13594, 13596, 13616, 13650, 13816,
       13870, 13888, 13891, 13905, 13925, 14066, 14144, 14196, 14249,
       14256, 14258, 14279, 14339, 14371, 14393, 14449, 14465, 14551,
       14558, 14569, 14579, 14719, 14737, 14742, 14759, 14826, 14840,
       14888, 14918, 14927, 15005, 15025, 15034, 15057, 15076, 15096,
       15190, 15221, 15281, 15301, 15326, 15438, 15440, 15509, 15514,
       15669, 15748, 15763, 15779, 15833, 15981, 16024, 16028, 16136,
       16173, 16195, 16323, 16365, 16424, 16434, 16447, 16452, 16475,
       16577, 16627, 16662, 16687, 16765, 16789, 16856, 16858, 16869,
       16876, 16905, 16919, 16934, 16957, 16964, 17008, 17012, 17013,
       17068, 17077, 17085, 17093, 17160, 17220, 17228, 17270, 17281,
       17288, 17338, 17368, 17406, 17473, 17478, 17522, 17583, 17593,
       17600, 17678, 17748, 17766, 17825, 17834, 17836, 17844, 17931,
       17941, 17946, 18032, 18116, 18126, 18177, 18180, 18182, 18223,
       18278, 18312, 18390, 18430, 18458, 18471, 18568, 18592, 18647,
       18669, 18690, 18703, 18805, 18852, 18928, 19054, 19095, 19154,
       19244, 19342, 19351, 19402, 19407, 19491, 19543, 19546, 19582,
       19629, 19649, 19673, 19674, 19718, 19772, 19791, 19807, 19833,
       19843, 19847, 19896, 20043, 20060, 20067, 20070, 20071, 20112,
       20149, 20262, 20299, 20376, 20411, 20455, 20480, 20483, 20509,
       20537, 20583, 20592, 20619, 20644, 20665, 20668, 20685, 20747,
       20751, 20752, 20774, 20801, 20817, 20851, 20877, 20882, 20928,
       20943, 20946, 20949, 20950, 20988, 20997, 21009, 21013, 21036,
       21108, 21114, 21117, 21141, 21174, 21271, 21277, 21362, 21368,
       21388, 21405, 21439, 21456, 21459, 21484, 21517, 21539, 21541,
       21544, 21546, 21568, 21578, 21591, 21619, 21705, 21726, 21762,
       21846, 21878, 21960, 22015, 22091, 22114, 22219, 22238, 22261,
       22376, 22450, 22483, 22529, 22555, 22653, 22721, 22725, 22758,
       22838, 22873, 22883, 22908, 22967, 22995, 23017, 23035, 23082,
       23090, 23146, 23156, 23171, 23188, 23251, 23274, 23293, 23334,
       23364, 23428, 23453, 23583, 23609, 23647, 23659, 23671, 23675,
       23739, 23788, 23819, 23835, 23915, 23989, 24015, 24062, 24079,
       24180, 24193, 24208, 24284, 24329, 24349, 24417, 24453, 24460,
       24476, 24480, 24493, 24540, 24553, 24564, 24737, 24738, 24753,
       24766, 24783, 24795, 24823, 24840, 24867, 24885, 24971, 24974,
       24991, 25013, 25034, 25047, 25054, 25103, 25107, 25153, 25263,
       25267, 25297, 25323, 25353, 25368, 25429, 25621, 25709, 25715,
       25725, 25755, 25793, 25816, 25855, 25879, 25889, 25931, 25939,
       25940, 25952, 25964, 26036, 26047, 26055, 26079, 26134, 26210,
       26216, 26290, 26292, 26317, 26336, 26364, 26395, 26496, 26530,
       26548, 26650, 26688, 26691, 26698, 26764, 26782, 26808, 26810,
       26814, 26846, 26868, 26877, 26907, 26908, 26926, 26941, 26950,
       26964, 27015, 27045, 27070, 27078, 27092, 27103, 27178, 27225,
       27238, 27259, 27289, 27336, 27343, 27406, 27472, 27491, 27516,
       27638, 27658, 27665, 27669, 27716, 27724, 27789, 27823, 27869,
       27918, 27937, 27954])
save predictions to csv file for Kaggle submission:

In [46]:
savetxt('./Kaggle_digits_recognition_nnPrediction{0}.csv'.format(whichConfig),indexAndPredictions,'%d',delimiter=',',comments='',header='ImageId,Label')

A special thank

I really appreciate Jake Vanderplass for his awesome introduction to scikit-learn and his works on https://github.com/jakevdp/sklearn_pycon2013.