Logistic Regression is a linear classifier. Logistic regression is usually one of the first and easiest-to-learn machiens in deep learning studies. Despite its simplicity and lack of heirarchy, logistic regressors are powerful machines as most real world data are linear (or often piece wise linear at best). This notebook is a tutorial for implementing Logistic Regression for the MNIST dataset of digit recognition using the yann toolbox. This will briefly go over some theory of Logistic Regression but for an in-depth study, refer the book or course materials.
Logistic regression is typically modelled for a dataset $ D = \{ x_i,y_i \vert x_i \in \mathbb{R}^d, y_i \in [1, 2 ,3 \dots c] \} $ as,
$$ \hat{y} = \phi\left(w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n \right), $$or,
$$ \hat{y} = \phi\left(\sum_{i=0}^n \bf{w}^T \bf{x}\right), $$where,
$$ \phi(\tau) = \frac{1}{1+{e}^{-\tau}} $$If $\phi(\tau)>0.5$ then this sample $x$ is classified as positive else it is classified as negative.
In the above equations only unknown we need to calculate is $W$, the parameter vector which also contains the bias $b$. $W$ can be calculated by using Gradient Descent optimization technique, where we start with a random values for $W$ and start changing it using the gradient of the negative log likelihood error function. To learn more about Gradient descent and other optimization techniques for logitic regression you can check the book or lecture materials optimization techniques.
In this notebook, we assume you finished yann setup before starting this tutorial. If you haven't done it already, you can follow Installation Guide for yann setup. To install in a quick fashion without much dependencies run the follwing command:
pip install git+git://github.com/ragavvenkatesan/yann.git
</code>
If there was an error with installing skdata, you might want to install numpy and scipy independently first and then run the above command. Note that this installer, does not enable a lot of options of the toolbox for which you need to go through the complete install described at the Installation Guide page.
The easiest way to get going with Yann is to follow this quick start guide. If you are not satisfied and want a more detailed introduction to the toolbox, you may refer to the Tutorials and the Structure of the Yann network. This tutorial was also presented in CSE591 at ASU and the video of the presentation is available.
In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo("0NFvfg8CItQ",theme="light", color="red")
Out[1]:
Verify that the installation of theano is indeed version 0.9 or greater by doing the following in a python shell
In [ ]:
import theano
theano.__version__
If the version was not 0.9, you can install 0.9 by doing the following:
pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
In this tutorial, we will go learn both the toolbox and how to implement a logistic regression simultaneously. Hopefully, this will give a nice introduction to the various features and API commands of the toolbox making further tutorials easier.
The start and the end of Yann toolbox is the network
module. The yann.network.network
object is where all the magic happens. Everything is manipualted through the netowrk object. Run the the following code to import network
module and create a network
object.
In [ ]:
from yann.network import network
net = network()
Voila! We have thus created a new network. The network doesn’t have any layers or modules in it. This be seen verified by probing into net.layers
property of the net
object. net.layers
is a dictionary in which each key is an id of a layer and each value is a yann.layers
object.
In [ ]:
net.layers
This produces an output which is essentially an empty dictionary {} because we did not add any layers to the network. Let’s add some layers! We can begin with an input
layer, which is where any neural network begins.
Before we do that, we need some data to train the network. The toolbox comes with a port to skdata through which we can get the MNIST dataset of handwritten characters can be built using this port.
To cook a mnist dataset for yann run the following code:
In [ ]:
from yann.special.datasets import cook_mnist
data = cook_mnist()
Running this code will print a statement to the following effect >>Dataset xxxxx
is created. The five digits marked xxxxx
in the statement is the codeword for the dataset. The actual dataset is located now at _datasets/_dataset_xxxxx/
from the directory from where this code was called. Mnist dataset is imported, coverted to a format consumable by yann and stored at this location. Refer to the Tutorials on how to convert your own dataset for yann. You can check the location of the dataset using data.datastet_location()
function.
In [ ]:
data.dataset_location()
So what is in this dataset that is created? Every dataset contains three sub directories: train, test and valid. Each of these in turn will contain .pkl files. The files are just dumps of data with two variabels: x
containing data and y
containing the labels. Each file corresponds to a batch of data which may still be broken down into many minibatches while training. MNIST dataset cooked using the default cook method will only contain one in each directory. The dataset is created to have a minibatch size of 500. There are 100 train minibatches in one batch and 20 test and valid minibatches.
The first layer that we need to add to our network now is an input layer. Every input
layer requries a dataset to be associated with it. Let us create this layer with the MNIST we just created.
In [ ]:
dataset_params = { "dataset": data.dataset_location(), "n_classes" : 10 }
net.add_layer(type = "input", dataset_init_args = dataset_params)
This piece of code creates and adds a new datastream
module to the net
. Modules are similar to layers in yann. Modules support the network. This command also automatically wires up the newly added input
layer with this (the last created) datastream
. Confirm this by checking net.datastream
.
In [ ]:
net.datastream
net.datastream
as can be seen is also a dictionary simliar to net.layers
.
Let us now build a classifier
layer. The default classifier that yann is setup with is the logistic regression classifier. Refer to Toolbox Documentation or Tutorials for other types of layers. Let us create a this classifier
layer for now.
In [ ]:
net.add_layer(type = "classifier" , num_classes = 10)
net.add_layer(type = "objective")
The layer objective
creates the loss function from the classifier that can be used as a learning metric. It also provides a scope for other modules such as the optimizer module. Refer Structure of the Yann network and Toolbox Documentation for more details on modules.
By default we add a negative log likelihood loss that we want to minimize. Now that our network is created and constructed we can check the layers in our network with net.layers
.
In [ ]:
net.layers
The keys of the dictionary such as '1'
, '0'
and '2'
are the id
of the layer. We could have created a layer with a custom id by supplying an id argument to the add_layer
method. To get a better idea of how the network looks like, you can use the pretty_print
mehtod in yann.
In [ ]:
net.pretty_print()
net.pretty_print
typically prints all the details of the network and its layers. Some of the properties can be accessed individuallty for every layer. For instance, we can acquire a particular layer's properties as follows:
In [ ]:
print net.layers['1'].output_shape
print net.layers['1'].activation
print net.layers['1'].active
print net.layers['1'].destination
print net.layers['1'].origin
# more available options can be found using the following:
dir( net.layers['1'] )
Most of these probes should be obvious. Here are some interesting ones. The origin
and destination
options provides list of layer ids that are feeding in and feeding out of this layer.
Now our network is finally ready to be trained. Before training, we need to build an optimizer
and other tools, but for now let us use the default ones. Once all of this is done, yann requires that the network be cooked
. For more details on cooking refer Structure of the Yann network. For now let us imagine that cooking a network will finalize the wiring, architecture, cache and prepare the first batch of data, prepare the modules and in general prepare the network for training using back propagation.
In [ ]:
net.cook()
Cooking would take a few seconds and might print what it is doing along the way. Once cooked, we may notice for instance that the network has a optimizer
module.
In [ ]:
net.optimizer
To train the model that we have just cooked, we can use the train
function that becomes available to us once the network is cooked.
In [ ]:
net.train()
This will print a progress for each epoch and will show validation accuracy after each epoch on a validation set that is independent from the training set. By default the training might run for 40 epochs: 20 on a higher learning rate and 20 more on a fine tuning learning rate. The learning rate will be printed after each epoch along with the negative log likelihood loss also.
Every layer also has an layer.output
object. The output
can be probed directly by using the layer_activity
method as long as it is directly or in-directly associated with a datastream module through an input
layer and the network was cooked. We need to do this because the output object is typically a theano computation graph. layer_activity
will evaluate this graph for the currently loaded minibatch. Let us observe the activity of the input layer for trial. We only print the shape instead of the whole numpy array to save screen space. Once trained we can observe this output. The layer activity will just be a numpy
array of numbers, so let us print its shape instead.
In [ ]:
print net.layer_activity(id='1').shape
print net.layers['1'].output_shape
The second line of code will verify the output we produced in the first line. An interesting layer output is the output of the objective
layer, which will give us the current negative log likelihood of the network, the one that we are trying to minimize.
In [ ]:
net.layer_activity(id = '2')
Once we are done training, we can run the network feedforward on the testing set to produce a generalization performance result.
In [ ]:
net.test()
Congratualations, you now know how to use the yann toolbox successfully. A full-fledge code of the logistic regression that we implemented here can be found here . That piece of code also has in-commentary that discusses briefly other options that could be supplied to some of the function calls we made here that explain the processes better. Hope you liked this quick start guide to the Yann toolbox and have fun!