In [ ]:
import os
#os.environ['THEANO_FLAGS'] = 'floatX=float64,device=cpu,mode=FAST_RUN'
os.environ['THEANO_FLAGS'] = 'floatX=float32, device=cuda, mode=FAST_RUN'
In [ ]:
import numpy as np
import theano
import theano.tensor as T
The symbolic inputs that you operate on are Variables and what you get from applying various Ops to these inputs are also Variables. A Variable is the main data structure you work with. A Type in Theano represents a set of constraints on potential data objects. These constraints allow Theano to tailor C code to handle them and to statically optimize the computation graph. The Type of both x
and y
is matrix
. Here is the complete list of types.
In [ ]:
x = T.matrix('x')
y = T.matrix('y')
An Op defines a certain computation on some types of inputs, producing some types of outputs. From a list of input Variables and an Op, you can build an Apply node representing the application of the Op to the inputs.
An Apply node is a type of internal node used to represent a computation graph. It represents the application of an Op on one or more inputs, where each input is a Variable. By convention, each Op is responsible for knowing how to build an Apply node from a list of inputs.
In [ ]:
z = x + y
theano.function
is the interface for compiling graphs into callable objects. When theano.function
is executed, the computation graph is optimized and theano generates an efficient code in C (with calls to CUDA if the gpu flag is set). This is totally transparent to the user, except for the different compilation modes.
The mode argument controls the sort of optimizations that will be applied to the graph, and the way the optimized graph will be evaluated. These modes are:
FAST_COMPILE
: Apply just a few graph optimizations and only use Python implementations. So GPU is disabled.
FAST_RUN
: Apply all optimizations and use C implementations where possible. (DEFAULT)
DebugMode
: Verify the correctness of all optimizations, and compare C and Python implementations. This mode can take much longer than the other modes, but can identify several kinds of problems.
The default is typically FAST_RUN
but this can be changed in theano.config.mode
.
In [ ]:
# theano.function([inputs], [outputs])
f = theano.function([x, y], z, allow_input_downcast=True)
a = np.random.randn(1, 3) # float64
b = np.random.randn(1, 3) # float64
f(a,b)
A Shared Variable is a hybrid symbolic and non-symbolic variable whose value may be shared between multiple functions. Shared variables can be used in symbolic expressions but they also have an internal value that defines the value taken by this symbolic variable in all the functions that use it. The value can be accessed and modified by the .get_value() and .set_value() methods.
In [ ]:
a = theano.shared(np.ones(3, dtype=theano.config.floatX), name = 'a')
print('Before ', a.get_value())
#a.set_value(1) # Type error, must be a numpy array of shape (3,)
#a.set_value(np.array([[1,2],[3,4]])) # Type error, must be a numpy array of shape (3,)
#a.set_value(np.array([1,2,3]))
a.set_value(np.array([1,2,3],dtype=theano.config.floatX))
print('After ', a.get_value())
Shared variables can be used to represent an internal state of a function. In order to modify this internal state, the function has an argument called updates
, which takes an iterable over pairs (shared_variable, new_expression) List, tuple or dict.
Note in the following that state
is an implicit input of the function accumulator
.
In [ ]:
state = theano.shared(0)
inc = T.iscalar('inc')
accumulator = theano.function([inc], state, updates=[(state, state+inc)])
The function is evaluated and then, the update mechanism is executed.
In [ ]:
print('First call to accumulator {}:'.format(accumulator(1)))
print('Second call to accumulator {}:'.format(accumulator(10)))
print('Third call to accumulator {}:'.format(accumulator(100)))
print('Fourth call to accumulator {}:'.format(accumulator(100)))
It may happen that you expressed some formula using a shared variable, but you do not want to use its value. In this case, you can use the givens parameter of function which replaces a particular node in a graph for the purpose of one particular function. The givens parameter can be used to replace any symbolic variable, not just a shared variable. You can replace constants, and expressions, in general. Be careful though, not to allow the expressions introduced by a givens substitution to be co-dependent, the order of substitution is not defined, so the substitutions have to work in any order.
In practice, a good way of thinking about the givens is as a mechanism that allows you to replace any part of your formula with a different expression that evaluates to a tensor of same shape and dtype. [reference]
In the following, we create a function that takes a scalar foo
, replace temporarily the variable state
and return its value.
In [ ]:
foo = T.scalar(dtype=state.dtype)
accumulator2 = theano.function([foo], state, givens=[(state, foo)])
print(accumulator2(1))
print(state.get_value()) # old state still there, but we didn't use it
The following is a simple linear transformation (out = Wx +b) followed by a nonlinearity (theano.sigmoid). Note that in this example, we are using allow_input_downcast=True
in order to avoid an error associated to downcasting x_val from a float64
to a float32
. Without this parameter, x_val
must be casted explicitly: x_val = np.random.rand(4).astype(np.float32)
.
In [ ]:
x = T.vector('x')
W = theano.shared(np.random.randn(3, 4).astype(theano.config.floatX), name = 'W')
b = theano.shared(np.ones(3, dtype=theano.config.floatX), name = 'b')
dot = T.dot(W, x)
out = T.nnet.sigmoid(dot + b)
predict = theano.function([x], out, allow_input_downcast=True)
x_val = np.random.rand(4)
print(predict(x_val))
In order to train the model, we define a cost function that will evaluate how far the model is from the target.
In [ ]:
y = T.vector('y')
C = ((out - y) ** 2).mean()
C.name = 'C'
error = theano.function([out, y], C, allow_input_downcast=True)
y_val = np.random.uniform(size=3).astype(theano.config.floatX)
print(error([ 0.66981461, 0.60965314, 0.76731602], y_val))
Now that the graph is defined, we can compute the gradient of the cost C w.r.t some parameters (W,b). The gradient must be applied to a scalar expression, e.g., the cost C.
In [ ]:
# theano.grad(exp, [Variable])
dC_dW, dC_db = theano.grad(C, [W, b])
Now that we can compute the gradients, we define the gradient descent update rule.
In [ ]:
eta_val = np.array(0.1, dtype=theano.config.floatX)
eta = theano.shared(eta_val, name='eta')
upd_W = W - eta * dC_dW
upd_b = b - eta * dC_db
Finally, we compile the expressions and the update rules.
In [ ]:
train = theano.function([x, y], C, updates=[(W, upd_W), (b, upd_b)], allow_input_downcast=True)
#train = theano.function([x, y], C, updates=[(W, upd_W), (b, upd_b)], allow_input_downcast=True)
#print(b.get_value())
#print(W.get_value())
We iterate the gradient descent update rule in order to minimize the cost.
In [ ]:
for i in range(50):
C_val = train(x_val, y_val)
print('Cost {:} at iteration {}'.format(C_val,i))
print(b.get_value())
print(W.get_value())
In [ ]:
from theano.printing import pydotprint
from IPython.display import Image, SVG
In [ ]:
Image(pydotprint(out, format='png', compact=False, return_image=True))
In [ ]:
Image(pydotprint(out, format='png', return_image=True))
In [ ]:
Image(pydotprint(predict, format='png', return_image=True))
In [ ]:
Image(pydotprint([upd_W, upd_b], format='png', return_image=True), width=1000)
In [ ]:
Image(pydotprint(train, format='png', return_image=True), width=1000)
In [ ]:
from theano.printing import debugprint
debugprint(out)
debugprint(predict)
In [ ]:
In [ ]:
In [ ]: