Note that the optional watermark extension is a small IPython notebook plugin that I developed to make the code reproducible. You can just skip the following line(s).
In [1]:
%load_ext watermark
%watermark -a '' -u -d -v -p numpy,matplotlib,theano,keras
The use of watermark
is optional. You can install this IPython extension via "pip install watermark
". For more information, please see: https://github.com/rasbt/watermark.
In [2]:
from IPython.display import Image
%matplotlib inline
A machine learning library with interface in Python.
Focus on tensors as the core data structure.
Tensors are multi-dimensional arrays.
Symbolic manipulation
Depending on your system setup, it is typically sufficient to install Theano via
pip install Theano
For more help with the installation, please see: http://deeplearning.net/software/theano/install.html
Introducing the TensorType variables. For a complete list, see http://deeplearning.net/software/theano/library/tensor/basic.html#all-fully-typed-constructors
In [3]:
import theano
from theano import tensor as T
import numpy as np
In [4]:
# define expression
# which can be visualized as a graph
x1 = T.scalar()
w1 = T.scalar()
w0 = T.scalar()
z1 = w1 * x1 + w0
# compile
net_input = theano.function(inputs=[w1, x1, w0], outputs=z1)
# execute
answer = net_input(2.0, 1.0, 0.5)
print(answer)
answer
Out[4]:
In [5]:
# define
b = T.scalar('b')
x = T.vector('x')
W = T.matrix('W')
y = x.dot(W.transpose())
z = W.dot(x) + b
# similar to python function
# theano function can return multiple outputs
f = theano.function(inputs = [x, W, b], outputs = [y, z])
output_y, output_z = f([1, 2], [[3, 4], [5, 6]], 1)
# output_y, output_z = f([[1, 2]], [[3, 4]], 1) # won't work as x is a vector not matrix
# output_y, output_z = f([1, 2], [3, 4], 1) # won't work as W is a matrix not vector
# output_y, output_z = f([1, 2], [[3, 4]], [1]) # won't work as b is a scalar not a vector/matrix
print(output_y)
print(output_z)
In [6]:
# quadratic polynomial root example
# ax^2 + bx + c = 0
a = T.scalar('a')
b = T.scalar('b')
c = T.scalar('c')
core = b*b - 4*a*c
root_p = (-b + np.sqrt(core))/(2*a)
root_m = (-b - np.sqrt(core))/(2*a)
# compile
f = theano.function(inputs = [a, b, c], outputs = [root_p, root_m])
In [7]:
# run
polys = [[1, 2, 1],
[1, -7, 12],
[1, 0, 1]
]
for poly in polys:
a, b, c = poly
root1, root2 = f(a, b, c)
print(root1, root2)
Steps for using Theano:
Each variable has a specific type (dtype)
In [8]:
# default configuration
print(theano.config.floatX)
In [9]:
# we can change it like this
theano.config.floatX = 'float32'
In [10]:
print(theano.config.floatX)
To change the float type globally, execute
export THEANO_FLAGS=floatX=float32
in your bash shell. Or execute Python script as
THEANO_FLAGS=floatX=float32 python your_script.py
Running Theano on GPU(s). For prerequisites, please see: http://deeplearning.net/software/theano/tutorial/using_gpu.html
Note that float32
is recommended for GPUs; float64
on GPUs is currently still relatively slow.
In [11]:
print(theano.config.device)
You can run a Python script on CPU (e.g. for prototyping and debug) via:
THEANO_FLAGS=device=cpu,floatX=float64 python your_script.py
or GPU (e.g. for real computation) via:
THEANO_FLAGS=device=gpu,floatX=float32 python your_script.py
It may also be convenient to create a .theanorc
file in your home directory to make those configurations permanent. For example, to always use float32
, execute
echo -e "\n[global]\nfloatX=float32\n" >> ~/.theanorc
Or, create a .theanorc
file manually with the following contents
[global]
floatX = float32
device = gpu
In [12]:
import numpy as np
# define
# if you are running Theano on 64 bit mode,
# you need to use dmatrix instead of fmatrix
x = T.matrix(name='x') # tensor with arbitrary shape
x_sum = T.sum(x, axis=0)
# compile
calc_sum = theano.function(inputs=[x], outputs=x_sum)
# execute (Python list)
ary = [[1, 2, 3], [1, 2, 3]]
print('Column sum:', calc_sum(ary))
# execute (NumPy array)
ary = np.array(ary, dtype=theano.config.floatX)
print('Column sum:', calc_sum(ary))
In [13]:
# name can help debug
y = T.matrix(name='hello')
z = T.matrix()
print(y) # will print out variable name
print(z) # will print out variable type
print(y.type()) # will print out type
In [14]:
# explicit type specification
wf = T.fmatrix(name='wfmatrix')
wd = T.dmatrix(name='wdmatrix')
print(wf.type())
print(wd.type())
Variable with storage that is shared between functions that it appears in.
Can be more efficient than input variable
More info about memory management in Theano can be found under:
In [15]:
# initialize
x = T.matrix(name='x')
b = theano.shared(np.asarray([[1]], dtype=theano.config.floatX), name='b')
w = theano.shared(np.asarray([[0.0, 0.0, 0.0]],
dtype=theano.config.floatX))
# w = w + 1.0 # this will cause error
z = x.dot(w.T) + b
update = [[w, w + 1.0]] # update w after each function call
# compile
f = theano.function(inputs=[x],
updates=update,
outputs=z)
# won't compile as shared variable cannot be used as input
# g = theano.function(inputs=[x, b], outputs = z)
# execute
x_data = np.array([[1, 2, 3]], dtype=theano.config.floatX)
for i in range(5):
print('z%d:' % i, f(x_data))
input: transfer from CPU to GPU multiple times
shared: retained values across functions, can be updated after each function call
given: transfer from CPU to GPU once
If we use inputs
, a datasets is transferred from the CPU to the GPU multiple times, for example, if we iterate over a dataset multiple times (epochs) during gradient descent.
We can use the givens
variable to insert values into the graph before compiling it. Using this approach we can reduce the number of transfers from RAM (via CPUs) to GPUs to speed up learning with shared variables.
Via givens
, we can keep the dataset on the GPU if it fits (e.g., a mini-batch).
In [16]:
# define
num_samples = 10
samples = np.asarray([i for i in range(num_samples)],
dtype=theano.config.floatX)
# samples = theano.shared(samples)
x = T.lscalar(name='index')
#y = theano.shared(np.asscalar(np.array([1], dtype=theano.config.floatX)))
y = T.vector(name='samples')
w = theano.shared(np.asscalar(np.array([0], dtype=theano.config.floatX)))
z = y[x]*w
# compile
f = theano.function(inputs = [x],
updates = [[w, w+1]],
givens = {y: samples},
outputs = z)
# run
for i in range(np.prod(samples.shape)):
print(f(i))
In [17]:
# initialize
x_data = np.array([[1, 2, 3]], dtype=theano.config.floatX)
x = T.matrix(name='hi')
w = theano.shared(np.asarray([[0, 0, 0], [0, 0, 0], [0, 0, 0]], dtype=theano.config.floatX))
# an input variable can be given
b_data = np.array([[-1, 0, 1]], dtype=theano.config.floatX)
b = T.matrix(name='bias')
# a shared variable can be given
c_data = np.array([[4, 5, 6]], dtype=theano.config.floatX)
c = theano.shared(np.asarray([[0]], dtype=theano.config.floatX))
z = x.dot(w.T) + b + c
updates = [[w, w + 1.0]]
givens = {b: b_data, c: c_data}
# compile
net_input = theano.function(inputs=[x],
updates=updates,
givens=givens,
outputs=z)
# execute
for i in range(5):
print('z:', net_input(x_data))
Model: $ y = \sum_{i=0}^n w_i x_i = \mathbf{w}^T \mathbf{x} $ with $x_0 = 1$.
Given a collection of sample data $\{\mathbf{x^{(i)}}, y^{(i)} \}$, find the line $\mathbf{w}$ that minimizes the regression error: $$ \begin{align} L(X, Y, \mathbf{w}) = \sum_i \left( y^{(i)} - \hat{y}^{(i)} \right)^2 = \sum_i \left( y^{(i)} - \mathbf{w}^T \mathbf{x}^{(i)} \right)^2 \end{align} $$
$ y = w_0 + w_1 x $
In [18]:
import numpy as np
X_train = np.asarray([[0.0], [1.0], [2.0], [3.0], [4.0],
[5.0], [6.0], [7.0], [8.0], [9.0]],
dtype=theano.config.floatX)
y_train = np.asarray([1.0, 1.3, 3.1, 2.0, 5.0,
6.3, 6.6, 7.4, 8.0, 9.0],
dtype=theano.config.floatX)
In [19]:
import theano
from theano import tensor as T
import numpy as np
def train_linreg(X_train, y_train, eta, epochs):
costs = []
# Initialize arrays
eta0 = T.scalar('eta0') # learning rate
y = T.vector(name='y')
X = T.matrix(name='X')
w = theano.shared(np.zeros(
shape=(X_train.shape[1] + 1),
dtype=theano.config.floatX),
name='w')
# calculate cost
y_pred = T.dot(X, w[1:]) + w[0]
errors = y - y_pred
cost = T.sum(T.pow(errors, 2))
# perform gradient update
gradient = T.grad(cost, wrt=w) # symbolic differentialtion
update = [(w, w - eta0 * gradient)]
# compile model
train = theano.function(inputs=[eta0],
outputs=cost,
updates=update,
givens={X: X_train,
y: y_train})
for _ in range(epochs):
# since eta is input
# we can gradually change the learning rate
costs.append(train(eta))
return costs, w
Plotting the sum of squared errors cost vs epochs.
In [20]:
import matplotlib.pyplot as plt
costs, w = train_linreg(X_train, y_train, eta=0.001, epochs=10)
plt.plot(range(1, len(costs)+1), costs)
plt.tight_layout()
plt.xlabel('Epoch')
plt.ylabel('Cost')
plt.tight_layout()
# plt.savefig('./figures/cost_convergence.png', dpi=300)
plt.show()
In [21]:
def predict_linreg(X, w):
Xt = T.matrix(name='X')
y_pred = T.dot(Xt, w[1:]) + w[0]
predict = theano.function(inputs=[Xt], givens={w: w}, outputs=y_pred)
return predict(X)
plt.scatter(X_train, y_train, marker='s', s=50)
plt.plot(range(X_train.shape[0]),
predict_linreg(X_train, w),
color='gray',
marker='o',
markersize=4,
linewidth=3)
plt.xlabel('x')
plt.ylabel('y')
plt.tight_layout()
# plt.savefig('./figures/linreg.png', dpi=300)
plt.show()
There are various activation functions for a multi-layer neural networks.
Sigmoid is one we have seen.
The logistic function, often just called "sigmoid function" is in fact a special case of a sigmoid function.
Linear input $z$: $$ \begin{align} z &= w_0x_{0} + \dots + w_mx_{m} \\ &= \sum_{j=0}^{m} x_{j}w_{j} \\ &= \mathbf{w}^T\mathbf{x} \end{align} $$ $w_0$ is the bias term, matching $x_0 = 1$
Logistic activation function:
$$\phi_{logistic}(z) = \frac{1}{1 + e^{-z}}$$Output range: (0, 1)
In [22]:
# note that first element (X[0] = 1) to denote bias unit
X = np.array([[1, 1.4, 1.5]])
w = np.array([0.0, 0.2, 0.4])
def net_input(X, w):
z = X.dot(w)
return z
def logistic(z):
return 1.0 / (1.0 + np.exp(-z))
def logistic_activation(X, w):
z = net_input(X, w)
return np.asscalar(logistic(z))
print('P(y=1|x) = %.3f' % logistic_activation(X, w))
In [23]:
# W : array, shape = [n_output_units, n_hidden_units+1]
# Weight matrix for hidden layer -> output layer.
# note that first column (A[:][0] = 1) are the bias units
W = np.array([[1.1, 1.2, 1.3, 0.5],
[0.1, 0.2, 0.4, 0.1],
[0.2, 0.5, 2.1, 1.9]])
# A : array, shape = [n_hidden+1, n_samples]
# Activation of hidden layer.
# note that first element (A[0][0] = 1) is for the bias units
A = np.array([[1.0],
[0.1],
[0.3],
[0.7]])
# Z : array, shape = [n_output_units, n_samples]
# Net input of output layer.
Z = W.dot(A)
y_probas = logistic(Z)
print('Probabilities:\n', y_probas)
The outputs do not sum to 1 and thus are not probabilities.
Need normalization for probability
OK for classification:
In [24]:
y_class = np.argmax(Z, axis=0)
print('predicted class label: %d' % y_class[0])
The softmax function
$z$: linear input as usual
$K$: number of classes
$$ \begin{align} P(y=j|z) =\phi_{softmax}(z) = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} \end{align} $$$P(y=j | z)$: probability of class $j$ for input $z$ in range $(0, 1)$
In [25]:
def softmax(z):
return np.exp(z) / np.sum(np.exp(z))
def softmax_activation(X, w):
z = net_input(X, w)
return softmax(z)
In [26]:
y_probas = softmax(Z) # same Z computed above
print('Probabilities:\n', y_probas)
In [27]:
y_probas.sum()
Out[27]:
The class probabilities sum to 1.
The predicted class is the same as logistic regression.
In [28]:
y_class = np.argmax(Z, axis=0)
y_class
Out[28]:
Another special case of a sigmoid function, it can be interpreted as a rescaled version of the logistic function.
$$ \begin{align} \phi_{tanh}(z) = \frac{e^{z}-e^{-z}}{e^{z}+e^{-z}} \end{align} $$$\phi_{tanh}$ is a rescaled version of $\phi_{logistic}$: $$ \begin{align} \phi_{tanh}(z) = 2 \phi_{logistic}(z) - 1 \end{align} $$
Output range: (-1, 1)
In [29]:
def tanh(z):
e_p = np.exp(z)
e_m = np.exp(-z)
return (e_p - e_m) / (e_p + e_m)
In [30]:
import matplotlib.pyplot as plt
z = np.arange(-5, 5, 0.005)
log_act = logistic(z)
tanh_act = tanh(z)
# alternatives:
# from scipy.special import expit
# log_act = expit(z)
# tanh_act = np.tanh(z)
plt.ylim([-1.5, 1.5])
plt.xlabel('net input $z$')
plt.ylabel('activation $\phi(z)$')
plt.axhline(1, color='black', linestyle='--')
plt.axhline(0.5, color='black', linestyle='--')
plt.axhline(0, color='black', linestyle='--')
plt.axhline(-1, color='black', linestyle='--')
plt.plot(z, tanh_act,
linewidth=2,
color='black',
label='tanh')
plt.plot(z, log_act,
linewidth=2,
color='lightgreen',
label='logistic')
plt.legend(loc='lower right')
plt.tight_layout()
# plt.savefig('./figures/activation.png', dpi=300)
plt.show()
Once you have Theano installed, Keras can be installed via
pip install Keras
1) Download the 4 MNIST datasets from http://yann.lecun.com/exdb/mnist/
2) Unzip those files
3) Copy the unzipped files to a directory ./mnist
Li-Yi: I enhanced the functions below so that we can do everything (including downloading and decompression) inside ipynb.
In [31]:
import os
import os.path
import struct
import gzip
import numpy as np
def open_mnist(full_path):
if full_path.find('.gz') >= 0:
return gzip.open(full_path, 'rb')
else:
return open(full_path, 'rb')
def pick_mnist(path, name, exts):
for ext in exts:
full_path = os.path.join(path, name + ext)
if os.path.isfile(full_path):
return full_path
# none of the exts options works
return None
def load_mnist(path, kind='train', exts=['', '.gz']):
"""Load MNIST data from `path`"""
labels_path = pick_mnist(path, kind + '-labels-idx1-ubyte', exts)
images_path = pick_mnist(path, kind + '-images-idx3-ubyte', exts)
with open_mnist(labels_path) as lbpath:
magic, n = struct.unpack('>II', lbpath.read(8))
if(magic != 2049):
raise IOError(str(magic) + ' != ' + str(2049))
# np.fromfile does not work with gzip open
# http://stackoverflow.com/questions/15966335/efficient-numpy-fromfile-on-zipped-files
# labels = np.fromfile(lbpath, dtype=np.uint8)
content = lbpath.read()
labels = np.frombuffer(content, dtype=np.uint8)
if(len(labels) != n):
raise IOError(str(len(labels)) + ' != ' + str(n))
with open_mnist(images_path) as imgpath:
magic, num, rows, cols = struct.unpack(">IIII", imgpath.read(16))
if(magic != 2051):
raise IOError(str(magic) + ' != ' + str(2051))
# images = np.fromfile(imgpath, dtype=np.uint8).reshape(num, rows*cols)
content = imgpath.read()
images = np.frombuffer(content, dtype=np.uint8).reshape(num, rows*cols)
if(num != len(labels)):
raise IOError(str(num) + ' != ' + str(len(labels)))
return images, labels
In [32]:
mnist_data_folder = os.path.join('..', 'datasets', 'mnist')
exts = ['', '.gz'] # for already gunzipped files and not yet gzipped files
X_train, y_train = load_mnist(mnist_data_folder, kind='train', exts=exts)
print('Rows: %d, columns: %d' % (X_train.shape[0], X_train.shape[1]))
X_test, y_test = load_mnist(mnist_data_folder, kind='t10k', exts=exts)
print('Rows: %d, columns: %d' % (X_test.shape[0], X_test.shape[1]))
In order to run the following code via GPU, you can execute the Python script that was placed in this directory via
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python mnist_keras_mlp.py
In [33]:
import theano
theano.config.floatX = 'float32'
X_train = X_train.astype(theano.config.floatX)
X_test = X_test.astype(theano.config.floatX)
In [34]:
from keras.utils import np_utils
print('First 3 labels: ', y_train[:3])
y_train_ohe = np_utils.to_categorical(y_train)
print('\nFirst 3 labels (one-hot):\n', y_train_ohe[:3])
In [35]:
from keras.models import Sequential
from keras.layers.core import Dense
from keras.optimizers import SGD
np.random.seed(1)
model = Sequential()
model.add(Dense(input_dim=X_train.shape[1],
output_dim=50,
init='uniform',
activation='tanh'))
model.add(Dense(output_dim=50,
init='uniform',
activation='tanh'))
model.add(Dense(output_dim=y_train_ohe.shape[1],
init='uniform',
activation='softmax'))
sgd = SGD(lr=0.001, decay=1e-7, momentum=.9)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=["accuracy"])
model.fit(X_train, y_train_ohe,
nb_epoch=50,
batch_size=300,
verbose=1,
validation_split=0.1 # 10% of training data for validation per epoch
)
Out[35]:
In [36]:
y_train_pred = model.predict_classes(X_train, verbose=0)
print('First 3 predictions: ', y_train_pred[:3])
In [37]:
train_acc = np.sum(y_train == y_train_pred, axis=0) / X_train.shape[0]
print('Training accuracy: %.2f%%' % (train_acc * 100))
In [38]:
y_test_pred = model.predict_classes(X_test, verbose=0)
test_acc = np.sum(y_test == y_test_pred, axis=0) / X_test.shape[0]
print('Test accuracy: %.2f%%' % (test_acc * 100))
Plenty deep learning libraries to use so that we do not have to write code from scratch.
Alternative libraries with Theano: