For a better comparison to future results we train a more or less standard forward neural net for the MNIST dataset. we included a dropout function which randomly sets a fraction of the input to 0. For more information on that visit https://keras.io/layers/core/. As activation for the hidden layers we use the relu function and on the output layer (class) the sigmoid function.
In [41]:
Out[41]:
With that architecture we can achieve an accuracy on the testset of 0.996859996414; this is quite impressive for a standard feed forward neural net. This result would rank our net under the best 10 achieved classification results for MNIST according to https://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html.
Tha basic idea about stacked autoencoders is to find better results in a faster manner than usual. A stacked autoencoder is an ensemble of autoencoders which are pretrained. The resulting weightrs are then used for further training. One major work about stacekd autoencoders comes from Hinton et al. https://www.cs.toronto.edu/~hinton/science.pdf In our experiments we plug a classifier onto the last decoding layer to enforce the encoder to learn the best possible data representation for the classification problem and not just for the input data. For the MNIST dataset we use standard dimensionality reduction like explained in https://blog.keras.io/building-autoencoders-in-keras.html: 784-128-64-32
For our reconstruction experiments we only need the weights of the encoders pretrained by the ensemble mentioned above.
In [51]:
Image('images/sae_first_layer.png')
Out[51]:
The test accuracy is very high at 0.995479997826. This yield that this might be a good way of compressing the data.
In [54]:
Image('images/sae_second_layer.png')
Out[54]:
Test accuracy: 0.995469997787
Test accuracy: 0.995829998398
In many applications it can be usefull to improve the quality of the data. Especially with image data it might happen du to network errors that the signal get corrupted. In such situations an autoencoder might help. We applied our SAE on the test data and got satisfying results with a loss of 0.112533422005 on the testset. One nice property of images is, that you can see your results directly. The first row is the original picture, the second row is the same picture after it went through the whole autoencoder network. This means 784-128-64-32 (encoding) to 64-128-784 (decoding). Netflix for example uses those autoencoders to deliver their image data seamlessly.
In [16]:
If you denoise your input pictures with a Gaussian random signal you might get something like shown in the figure below. A well trained autoencder like our SAE is able to reconstruct images like that.
In [18]:
First row: original numbers, second row: reconstructed numbers from the previous figure.
In [24]:
If you feed the noisy data into our standard neural net classifier you get a test accuracy of 0.911980030918, if you do the denoising step prior to that your classification results improve significantly to 0.987500004768.
The next experimental setup yields a classifier which is plugged into the standard neural net directly after the encoding part of the autoencoder: 784-128-64-32-10. The result is 0.995759996986 which means that although most of the dimensions are dropped, the classifier nearly performs as good as the standard neural net from above. It even might be possible to improve that result by including some regularization or dropout.
In [3]:
Image('images/sae_classifier.png')
Out[3]:
To get a good visible explanation of what an autoencoder is doing, we tried to distill the data down to only two dimensions; in hope to see a picture of our problem. Maybe a picture where an unsupervised learning technique like clustering might be able to classifiy most of the points correctly.
Adding an additional two dimensional classifying single autoencoder 32-2-32-10 to our previous SAE setup yields two interesting results:
For this last layer we used sigmoid activation instead of relu. This would not change much in our classification result but the information is more uniform spread in the [0,1]x[0,1] square. So it gives a better view but does not make much difference for the machine.
In [172]:
Another idea would be to directly encode all the data into a two- dim layer like. 784-2-784-10 single autoencoder. Interesteingly this yield the exact same accuracy 0.968730016327 on the testset than the more granulated compression mentioned above (784-128-64....). But: the distribution of the data in the unit square is not as uniform. So we conclude, that a stacked autoencoder reduces dimension in a more uniform way than a non- stacked autoencoder.
In [179]: