Intro to deep learning in Chainer

Welcome - this interactive tutorial will introduce you to deep learning in Chainer, to prepare for the DIY practical tutorial.

0. iPython

First off, you need to know how to run code & see the results. When you see Exercise, it is an exercise for you to do. Try to read & do each exercise in order, to understand what is going on!

Exercise - run the next cell by selecting it & pressing Ctrl+Enter



In [ ]:

    
a = 100
print("a is", a)

a + 200

This is an iPython Notebook. You can write whatever Python code you like here - output like the print will be shown below the cell, and the final result is also shown (the result of a + 200).

Note - your Python code is running on a server I've set up (which has everything you need), not on your local machine.

Exercise - save the notebook (do this regularly), by pressing Ctrl+s (or the save icon)

Hint - if you are struggling what to write at any point, try pressing Tab - iPython should try to offer some sensible completions. If you want to know what a function does, try Shift+Tab to bring up documentation.

1. Numpy

Next we'll import the libraries we need...



In [ ]:

    
%matplotlib inline
import dlt
import numpy as np
import chainer as C

Now we'll learn how to use these libraries to create deep learning functions (later, in the full tutorial, we'll use this to train a handwriting recognizer).

Here are two ways to create a numpy array:



In [ ]:

    
a = np.array([1, 2, 3, 4, 5], dtype=np.int32)
print("a =", a)
print("a.shape =", a.shape)
print()

b = np.zeros((2, 3), dtype=np.float32)
print("b =", b)
print("b.shape =", b.shape)

A np.array is a multidimensional array - a very flexible thing, it can be:

0-dimensional (a number, like 5)
1-dimensional (a vector, like a above)
2-dimensional (a matrix, like b above)
N-dimensional (...)

It can also contain either whole numbers (np.int32) or real numbers (np.float32).

OK, I've done a bit much now - time for you...

Exercise - create the following numpy arrays, and print out the shape:



In [ ]:

    
# EXERCISE
# 1. an array scalar containing the integer 5

# 2. a (10, 20) array of zeros

# 3. a (3, 3) array of different numbers (hint: use a list-of-lists)

Now we just need a few ways of working with these arrays - here are some examples of things that you can do:



In [ ]:

    
x = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.int32)
print("x =\n%s" % x)
print()

# Indexing
print("x[0, 1] =", x[0, 1]) # 0th row, 1st column
print("x[1, 1] =", x[1, 1]) # 1st row, 1st column
print()

# Slicing
print("x[0, :] =", x[0, :]) # 0th row, all columns
print("x[:, 2] =", x[:, 2]) # 2nd column, all rows
print("x[1, :] =", x[1, :]) # 1st row, all columns
print("x[1, 0:2] =", x[1, 0:2]) # 1st row, first two columns
print()

# Other numpy functions (there are very many more...)
print("np.argmax(x[0, :]) =", np.argmax(x[0, :])) # Find the index of the maximum element in the 0th row

I won't explain all of this in detail, but have a play around with arrays, see what you can do with the above operations.

Exercise - try to use your numpy operations to find the following with M:



In [ ]:

    
M = np.arange(900, dtype=np.float32).reshape(45, 20)
print(M.shape)

# EXERCISE
# 1. print out row number 0 (hint, it should be shape (20,))

# 2. print out row number 34

# 3. select column 15, print out the shape

# 4. select rows 30-40 inclusive, columns 5-8 inclusive, print out the shape (hint: should be (11, 4))

2. Chainer

We'll use numpy to get data in & out of Chainer, which is our deep learning library, but Chainer will do most of the data processing.

Here is how you get some data into Chainer, use a linear operation to change its shape, and get the result back out again:



In [ ]:

    
a = C.Variable(np.zeros((10, 20), dtype=np.float32))
print("a.data.shape =", a.data.shape)

transformation = C.links.Linear(20, 30)
b = transformation(a)
print("b.data.shape =", b.data.shape)

c = C.functions.tanh(b)
print("c.data.shape =", c.data.shape)

This may not seem particularly special, but this is the heart of a deep learning function. Take an input array, make various transformations that mess around with the shape, and produce an output array.

Some concepts:

A Variable holds an array - this is some data going through the function
A Link contains some parameters (these start random), which process an input Variable, and produce an output Variable.
A Function is a Link without any parameters (like sin, cos, tan, tanh, max... so many more...)

Exercise - use Chainer to calculate the following:



In [ ]:

    
# EXERCISE
# 1. Create an array, shape (2, 3) of various float numbers, put it in a variable
a = None # your array here

# 2. Print out tanh(a) (for the whole array)

# 3. Create a linear link of shape (3, 5) - this means it takes (N, 3) and produces (N, 5)
mylink = None # your link here

# 4. Use your link to transform `a`, then take the tanh, check the shape of the result

# 5. Uncomment the following; what happens when you re-run the code?
# print("W =", mylink.W.data)

If you can do all of this, you're ready to create a deep learning function.

In the last step, you may have noticed something interesting - the parameters inside the link change every time it is re-created. This is because deep learning functions start off random! Random functions don't sound too useful, so later we're going to learn how to "teach" them to be useful functions.

3. Plotting curves

We've provided a very simple log plotting library, dlt.Log, demonstrated below:



In [ ]:

    
log = dlt.Log()
for i in range(100):
    # The first argument "loss" says which plot to put the value on
    # The second argument "train" gives it a name on that plot
    # The third argument is the y-value
    log.add("loss", "train", i)
    log.add("loss", "valid", 2 * i)
log.show()

Exercise - try to add another curve to the plot, e.g. np.sqrt(i) - you'll need to give it a different name.

Summary

OK - this was quite a lot to learn! To review, you've learnt how to:

Create and inspect the shape of numpy arrays.
Slice & select parts of numpy arrays.
Put numpy arrays into chainer variables.
Process Chainer variables using links and functions.
Show progress in a plot.

Next, we'll put all of this together, adding training, as we teach a deep learning function to recognize handwritten digits, so see the DIY guide & Tutorial.ipynb.