First let's do the standard import
In [70]:
import time
import numpy as np
#import matplotlib.pyplot as plt
import theano
# By convention, the tensor submodule is loaded as T
import theano.tensor as T
The following are all Theano defined types:
In [71]:
A = T.matrix('A')
b = T.scalar('b')
v = T.vector('v')
print A.type
print b.type
print v.type
All those types are symbolic, meaning they don't have values at all. Theano variables can be defined with simple relations, such as
In [72]:
a = T.scalar('a')
c = a**2
print a.type
print c.type
note that c
is also a symbolic scalar here.
We can also define a function:
In [73]:
f = theano.function([a],a**2)
print f
Again, Theano functions are symbolic as well. We must evaluate the function with some input to check its output. For example:
In [74]:
print f(2)
Shared variable is also a Theano type
In [75]:
shared_var = theano.shared(np.array([[1, 2], [3, 4]],
dtype=theano.config.floatX))
print 'variable type:'
print shared_var.type
print '\nvariable value:'
print shared_var.get_value()
shared_var.set_value(np.array([[4, 5], [6, 7]],
dtype=theano.config.floatX))
print '\nvalues changed:'
print shared_var.get_value()
They have a fixed value, but are still treated as symbolic (can be input to functions etc.).
Shared variables are perfect to use as state variables or parameters. Fore example, in CNN each layer has a parameter matrix 'W', we need to store its value so we can perform testing against thousands of images, yet we also need to update their values during training.
As a side note, since they have fixed value, they don't need to be explicitly specified as input to a function:
In [76]:
bias = T.matrix('bias')
shared_squared = shared_var**2 + bias
bias_value = np.array([[1,1],[1,1]],
dtype=theano.config.floatX)
f1 = theano.function([bias],shared_squared)
print f1(bias_value)
print '\n'
print shared_squared.eval({bias:bias_value})
The example above defines a function that takes square of a shared_var and add by a bias. When evaluating the function we only provide value for bias because we know that the shared variable is fixed value.
To calculate gradient we can use a T.grad()
function to return a tensor variable. We first define some variable:
In [77]:
def square(a):
return a**2
a = T.scalar('a')
b = square(a)
c = square(b)
Then we define two ways to evaluate gradient:
In [78]:
grad = T.grad(c,a)
f_grad = theano.function([a],grad)
The TensorVariable grad calculates gradient of b w.r.t. a. The function f_grad takes a as input and grad as output, so it should be equivalent. However, evaluating them have different formats:
In [79]:
print grad.eval({a:10})
print f_grad(10)
In [80]:
class layer(object):
def __init__(self, W_init, b_init, activation):
[n_output, n_input] = W_init.shape
assert b_init.shape == (n_output,1) or b_init.shape == (n_output,)
self.W = theano.shared(value = W_init.astype(theano.config.floatX),
name = 'W',
borrow = True)
self.b = theano.shared(value = b_init.reshape(n_output,1).astype(theano.config.floatX),
name = 'b',
borrow = True,
broadcastable=(False, True))
self.activation = activation
self.params = [self.W, self.b]
#return super(layer, self).__init__(*args, **kwargs)
def output(self, x):
lin_output = T.dot(self.W, x) + self.b
if self.activation is not None:
non_lin_output = self.activation(lin_output)
return ( lin_output if self.activation is None else non_lin_output )
t1 = time.time()
W_init = np.ones([3,3])
b_init = np.array([1,3,2.5]).transpose()
activation = None #T.nnet.sigmoid
L = layer(W_init,b_init,activation)
x = T.vector('x')
out = L.output(x)
print out.eval({x:np.array([1.0,2,3.5]).astype(theano.config.floatX)})
t2 = time.time()
print 'time:', t2-t1
The code snippet below can plot a flowchart that shows what happens inside our mlp layer. Note that the input out
is the output of layer, as defined above.
In [81]:
from IPython.display import SVG
SVG(theano.printing.pydotprint(out, return_image=True,
compact = True,
var_with_name_simple = True,
format='svg'))
Out[81]:
In [ ]: