Introduction to Theano

To execute a cell: Ctrl-Enter.

The code was executed with the default configuration of Theano: floatX=float64, device=cpu and the configuration for GPU floatX=float32,device=cuda.


In [ ]:
import os
#os.environ['THEANO_FLAGS'] = 'floatX=float64,device=cpu,mode=FAST_RUN'
os.environ['THEANO_FLAGS'] = 'floatX=float32, device=cuda, mode=FAST_RUN'

In [ ]:
import numpy as np
import theano
import theano.tensor as T

Theano concepts

Symbolic input

The symbolic inputs that you operate on are Variables and what you get from applying various Ops to these inputs are also Variables. A Variable is the main data structure you work with. A Type in Theano represents a set of constraints on potential data objects. These constraints allow Theano to tailor C code to handle them and to statically optimize the computation graph. The Type of both x and y is matrix. Here is the complete list of types.


In [ ]:
x = T.matrix('x')
y = T.matrix('y')

Operation

An Op defines a certain computation on some types of inputs, producing some types of outputs. From a list of input Variables and an Op, you can build an Apply node representing the application of the Op to the inputs.

An Apply node is a type of internal node used to represent a computation graph. It represents the application of an Op on one or more inputs, where each input is a Variable. By convention, each Op is responsible for knowing how to build an Apply node from a list of inputs.


In [ ]:
z = x + y

theano.function

theano.function is the interface for compiling graphs into callable objects. When theano.function is executed, the computation graph is optimized and theano generates an efficient code in C (with calls to CUDA if the gpu flag is set). This is totally transparent to the user, except for the different compilation modes. The mode argument controls the sort of optimizations that will be applied to the graph, and the way the optimized graph will be evaluated. These modes are:

  • FAST_COMPILE: Apply just a few graph optimizations and only use Python implementations. So GPU is disabled.

  • FAST_RUN: Apply all optimizations and use C implementations where possible. (DEFAULT)

  • DebugMode: Verify the correctness of all optimizations, and compare C and Python implementations. This mode can take much longer than the other modes, but can identify several kinds of problems.

The default is typically FAST_RUN but this can be changed in theano.config.mode.


In [ ]:
# theano.function([inputs], [outputs])
f = theano.function([x, y], z, allow_input_downcast=True)

a = np.random.randn(1, 3) # float64 
b = np.random.randn(1, 3) # float64
f(a,b)

Shared variable

A Shared Variable is a hybrid symbolic and non-symbolic variable whose value may be shared between multiple functions. Shared variables can be used in symbolic expressions but they also have an internal value that defines the value taken by this symbolic variable in all the functions that use it. The value can be accessed and modified by the .get_value() and .set_value() methods.


In [ ]:
a = theano.shared(np.ones(3, dtype=theano.config.floatX), name = 'a')
print('Before ', a.get_value())
#a.set_value(1) #  Type error, must be a numpy array of shape (3,)
#a.set_value(np.array([[1,2],[3,4]])) # Type error, must be a numpy array of shape (3,)
#a.set_value(np.array([1,2,3]))
a.set_value(np.array([1,2,3],dtype=theano.config.floatX))
print('After ', a.get_value())

Shared variables and functions

Shared variables can be used to represent an internal state of a function. In order to modify this internal state, the function has an argument called updates, which takes an iterable over pairs (shared_variable, new_expression) List, tuple or dict.

Note in the following that state is an implicit input of the function accumulator.


In [ ]:
state = theano.shared(0)
inc = T.iscalar('inc')
accumulator = theano.function([inc], state, updates=[(state, state+inc)])

The function is evaluated and then, the update mechanism is executed.


In [ ]:
print('First call to accumulator {}:'.format(accumulator(1)))
print('Second call to accumulator {}:'.format(accumulator(10)))
print('Third call to accumulator {}:'.format(accumulator(100)))
print('Fourth call to accumulator {}:'.format(accumulator(100)))

It may happen that you expressed some formula using a shared variable, but you do not want to use its value. In this case, you can use the givens parameter of function which replaces a particular node in a graph for the purpose of one particular function. The givens parameter can be used to replace any symbolic variable, not just a shared variable. You can replace constants, and expressions, in general. Be careful though, not to allow the expressions introduced by a givens substitution to be co-dependent, the order of substitution is not defined, so the substitutions have to work in any order.

In practice, a good way of thinking about the givens is as a mechanism that allows you to replace any part of your formula with a different expression that evaluates to a tensor of same shape and dtype. [reference]

In the following, we create a function that takes a scalar foo, replace temporarily the variable state and return its value.


In [ ]:
foo = T.scalar(dtype=state.dtype)
accumulator2 = theano.function([foo], state, givens=[(state, foo)])
print(accumulator2(1))
print(state.get_value())  # old state still there, but we didn't use it

A regression toy example

Build a simple model

The following is a simple linear transformation (out = Wx +b) followed by a nonlinearity (theano.sigmoid). Note that in this example, we are using allow_input_downcast=True in order to avoid an error associated to downcasting x_val from a float64 to a float32. Without this parameter, x_val must be casted explicitly: x_val = np.random.rand(4).astype(np.float32).


In [ ]:
x = T.vector('x')
W = theano.shared(np.random.randn(3, 4).astype(theano.config.floatX), name = 'W')
b = theano.shared(np.ones(3, dtype=theano.config.floatX), name = 'b')

dot = T.dot(W, x)
out = T.nnet.sigmoid(dot + b)

predict = theano.function([x], out, allow_input_downcast=True)
x_val = np.random.rand(4)
print(predict(x_val))

In order to train the model, we define a cost function that will evaluate how far the model is from the target.


In [ ]:
y = T.vector('y')
C = ((out - y) ** 2).mean()
C.name = 'C'
error = theano.function([out, y], C, allow_input_downcast=True)

y_val = np.random.uniform(size=3).astype(theano.config.floatX)
print(error([ 0.66981461,  0.60965314,  0.76731602], y_val))

Automatic differentiation

Now that the graph is defined, we can compute the gradient of the cost C w.r.t some parameters (W,b). The gradient must be applied to a scalar expression, e.g., the cost C.


In [ ]:
# theano.grad(exp, [Variable])
dC_dW, dC_db = theano.grad(C, [W, b])

Now that we can compute the gradients, we define the gradient descent update rule.


In [ ]:
eta_val = np.array(0.1, dtype=theano.config.floatX)
eta = theano.shared(eta_val, name='eta')
upd_W = W - eta * dC_dW
upd_b = b - eta * dC_db

Finally, we compile the expressions and the update rules.


In [ ]:
train = theano.function([x, y], C, updates=[(W, upd_W), (b, upd_b)], allow_input_downcast=True)
#train = theano.function([x, y], C, updates=[(W, upd_W), (b, upd_b)], allow_input_downcast=True)
#print(b.get_value())
#print(W.get_value())

We iterate the gradient descent update rule in order to minimize the cost.


In [ ]:
for i in range(50):
    C_val = train(x_val, y_val)
    print('Cost {:} at iteration {}'.format(C_val,i))
print(b.get_value())
print(W.get_value())

Visualization and debugging

Graph visualization

Comparing out with predict


In [ ]:
from theano.printing import pydotprint
from IPython.display import Image, SVG

In [ ]:
Image(pydotprint(out, format='png', compact=False, return_image=True))

In [ ]:
Image(pydotprint(out, format='png', return_image=True))

In [ ]:
Image(pydotprint(predict, format='png', return_image=True))

In [ ]:
Image(pydotprint([upd_W, upd_b], format='png', return_image=True), width=1000)

In [ ]:
Image(pydotprint(train, format='png', return_image=True), width=1000)

In [ ]:
from theano.printing import debugprint
debugprint(out)
debugprint(predict)

In [ ]:


In [ ]:


In [ ]: