Basics about Theano

First let's do the standard import


In [70]:
import time
import numpy as np
#import matplotlib.pyplot as plt
import theano
# By convention, the tensor submodule is loaded as T
import theano.tensor as T

The following are all Theano defined types:


In [71]:
A = T.matrix('A')
b = T.scalar('b')
v = T.vector('v')

print A.type
print b.type
print v.type


TensorType(float32, matrix)
TensorType(float32, scalar)
TensorType(float32, vector)

All those types are symbolic, meaning they don't have values at all. Theano variables can be defined with simple relations, such as


In [72]:
a = T.scalar('a')
c = a**2

print a.type
print c.type


TensorType(float32, scalar)
TensorType(float32, scalar)

note that c is also a symbolic scalar here.

We can also define a function:


In [73]:
f = theano.function([a],a**2)
print f


<theano.compile.function_module.Function object at 0x0000000029FEBC88>

Again, Theano functions are symbolic as well. We must evaluate the function with some input to check its output. For example:


In [74]:
print f(2)


4.0

Shared variable is also a Theano type


In [75]:
shared_var = theano.shared(np.array([[1, 2], [3, 4]], 
                                    dtype=theano.config.floatX))
print 'variable type:'
print shared_var.type
print '\nvariable value:'
print shared_var.get_value()

shared_var.set_value(np.array([[4, 5], [6, 7]], 
                                    dtype=theano.config.floatX))
print '\nvalues changed:'
print shared_var.get_value()


variable type:
TensorType(float32, matrix)

variable value:
[[ 1.  2.]
 [ 3.  4.]]

values changed:
[[ 4.  5.]
 [ 6.  7.]]

They have a fixed value, but are still treated as symbolic (can be input to functions etc.).

Shared variables are perfect to use as state variables or parameters. Fore example, in CNN each layer has a parameter matrix 'W', we need to store its value so we can perform testing against thousands of images, yet we also need to update their values during training.

As a side note, since they have fixed value, they don't need to be explicitly specified as input to a function:


In [76]:
bias = T.matrix('bias')
shared_squared = shared_var**2 + bias

bias_value = np.array([[1,1],[1,1]], 
                  dtype=theano.config.floatX)

f1 = theano.function([bias],shared_squared)
print f1(bias_value)
print '\n'
print shared_squared.eval({bias:bias_value})


[[ 17.  26.]
 [ 37.  50.]]


[[ 17.  26.]
 [ 37.  50.]]

The example above defines a function that takes square of a shared_var and add by a bias. When evaluating the function we only provide value for bias because we know that the shared variable is fixed value.

Gradients

To calculate gradient we can use a T.grad() function to return a tensor variable. We first define some variable:


In [77]:
def square(a):
    return a**2

a = T.scalar('a')
b = square(a)
c = square(b)

Then we define two ways to evaluate gradient:


In [78]:
grad = T.grad(c,a)
f_grad = theano.function([a],grad)

The TensorVariable grad calculates gradient of b w.r.t. a. The function f_grad takes a as input and grad as output, so it should be equivalent. However, evaluating them have different formats:


In [79]:
print grad.eval({a:10})
print f_grad(10)


4000.0
4000.0

MLP Demo with Theano


In [80]:
class layer(object):
    def __init__(self, W_init, b_init, activation):

        [n_output, n_input] = W_init.shape
        assert b_init.shape == (n_output,1) or b_init.shape == (n_output,)
        self.W = theano.shared(value = W_init.astype(theano.config.floatX),
                               name = 'W',
                               borrow = True)
        self.b = theano.shared(value = b_init.reshape(n_output,1).astype(theano.config.floatX),
                               name = 'b',
                               borrow = True,
                               broadcastable=(False, True))
        self.activation = activation
        self.params = [self.W, self.b]
        #return super(layer, self).__init__(*args, **kwargs)
    def output(self, x):
        lin_output = T.dot(self.W, x) + self.b
        if self.activation is not None:
            non_lin_output = self.activation(lin_output)
        return ( lin_output if self.activation is None else non_lin_output )

t1 = time.time()
W_init = np.ones([3,3])
b_init = np.array([1,3,2.5]).transpose()
activation = None #T.nnet.sigmoid
L = layer(W_init,b_init,activation)
x = T.vector('x')
out = L.output(x)
print out.eval({x:np.array([1.0,2,3.5]).astype(theano.config.floatX)})
t2 = time.time()
print 'time:', t2-t1


[[ 7.5  7.5  7.5]
 [ 9.5  9.5  9.5]
 [ 9.   9.   9. ]]
time: 1.14300012589

Plotting Flowchart (or Theano Graph)

The code snippet below can plot a flowchart that shows what happens inside our mlp layer. Note that the input out is the output of layer, as defined above.


In [81]:
from IPython.display import SVG
SVG(theano.printing.pydotprint(out, return_image=True,
                               compact = True, 
                               var_with_name_simple = True,
                               format='svg'))


Out[81]:
G dot dot DimShuffle{x,0} DimShuffle{x,0} dot->DimShuffle{x,0} TensorType(float32, vector) W W W->dot 0 TensorType(float32, matrix) x x x->dot 1 TensorType(float32, vector) Elemwise{add,no_inplace} Elemwise{add,no_inplace} DimShuffle{x,0}->Elemwise{add,no_inplace} 0 TensorType(float32, row) TensorType(float32, matrix) TensorType(float32, matrix) Elemwise{add,no_inplace}->TensorType(float32, matrix) TensorType(float32, matrix) b b b->Elemwise{add,no_inplace} 1 TensorType(float32, col)

In [ ]: