and why they matter
Install the bleeding edge version from here: http://lasagne.readthedocs.org/en/latest/user/installation.html
In [ ]:
import numpy as np
def sum_squares(N):
return <student.Implement_me()>
In [ ]:
%%time
sum_squares(10**8)
In [ ]:
import theano
import theano.tensor as T
In [ ]:
#I gonna be function parameter
N = T.scalar("a dimension",dtype='int32')
#i am a recipe on how to produce sum of squares of arange of N given N
result = (T.arange(N)**2).sum()
#Compiling the recipe of computing "result" given N
sum_function = theano.function(inputs = [N],outputs=result)
In [ ]:
%%time
sum_function(10**8)
if you're currently in classroom, chances are i am explaining this text wall right now
Still confused? We gonna fix that.
In [ ]:
#Inputs
example_input_integer = T.scalar("scalar input",dtype='float32')
example_input_tensor = T.tensor4("four dimensional tensor input") #dtype = theano.config.floatX by default
#не бойся, тензор нам не пригодится
input_vector = T.vector("", dtype='int32') # vector of integers
In [ ]:
#Transformations
#transofrmation: elementwise multiplication
double_the_vector = input_vector*2
#elementwise cosine
elementwise_cosine = T.cos(input_vector)
#difference between squared vector and vector itself
vector_squares = input_vector**2 - input_vector
In [ ]:
#Practice time:
#create two vectors of size float32
my_vector = student.init_float32_vector()
my_vector2 = student.init_one_more_such_vector()
In [ ]:
#Write a transformation(recipe):
#(vec1)*(vec2) / (sin(vec1) +1)
my_transformation = student.implementwhatwaswrittenabove()
In [ ]:
print my_transformation
#it's okay it aint a number
In [ ]:
inputs = [<two vectors that my_transformation depends on>]
outputs = [<What do we compute (can be a list of several transformation)>]
# The next lines compile a function that takes two vectors and computes your transformation
my_function = theano.function(
inputs,outputs,
allow_input_downcast=True #automatic type casting for input parameters (e.g. float64 -> float32)
)
In [ ]:
#using function with, lists:
print "using python lists:"
print my_function([1,2,3],[4,5,6])
print
#Or using numpy arrays:
#btw, that 'float' dtype is casted to secong parameter dtype which is float32
print "using numpy arrays:"
print my_function(np.arange(10),
np.linspace(5,6,10,dtype='float'))
In [ ]:
#a dictionary of inputs
my_function_inputs = {
my_vector:[1,2,3],
my_vector2:[4,5,6]
}
# evaluate my_transformation
# has to match with compiled function output
print my_transformation.eval(my_function_inputs)
# can compute transformations on the fly
print "add 2 vectors", (my_vector + my_vector2).eval(my_function_inputs)
#!WARNING! if your transformation only depends on some inputs,
#do not provide the rest of them
print "vector's shape:", my_vector.shape.eval({
my_vector:[1,2,3]
})
In [ ]:
# Quest #1 - implement a function that computes a mean squared error of two input vectors
# Your function has to take 2 vectors and return a single number
<student.define_inputs_and_transformations()>
compute_mse =<student.compile_function()>
In [ ]:
# Tests
from sklearn.metrics import mean_squared_error
for n in [1,5,10,10**3]:
elems = [np.arange(n),np.arange(n,0,-1), np.zeros(n),
np.ones(n),np.random.random(n),np.random.randint(100,size=n)]
for el in elems:
for el_2 in elems:
true_mse = np.array(mean_squared_error(el,el_2))
my_mse = compute_mse(el,el_2)
if not np.allclose(true_mse,my_mse):
print 'Wrong result:'
print 'mse(%s,%s)'%(el,el_2)
print "should be: %f, but your function returned %f"%(true_mse,my_mse)
raise ValueError,"Что-то не так"
print "All tests passed"
The inputs and transformations only exist when function is called
Shared variables always stay in memory like global variables
In [ ]:
#creating shared variable
shared_vector_1 = theano.shared(np.ones(10,dtype='float64'))
In [ ]:
#evaluating shared variable (outside symbolicd graph)
print "initial value",shared_vector_1.get_value()
# within symbolic graph you use them just as any other inout or transformation, not "get value" needed
In [ ]:
#setting new value
shared_vector_1.set_value( np.arange(5) )
#getting that new value
print "new value", shared_vector_1.get_value()
#Note that the vector changed shape
#This is entirely allowed... unless your graph is hard-wired to work with some fixed shape
In [ ]:
# Write a recipe (transformation) that computes an elementwise transformation of shared_vector and input_scalar
#Compile as a function of input_scalar
input_scalar = T.scalar('coefficient',dtype='float32')
scalar_times_shared = <student.write_recipe()>
shared_times_n = <student.compile_function()>
In [ ]:
print "shared:", shared_vector_1.get_value()
print "shared_times_n(5)",shared_times_n(5)
print "shared_times_n(-0.5)",shared_times_n(-0.5)
In [ ]:
#Changing value of vector 1 (output should change)
shared_vector_1.set_value([-1,0,1])
print "shared:", shared_vector_1.get_value()
print "shared_times_n(5)",shared_times_n(5)
print "shared_times_n(-0.5)",shared_times_n(-0.5)
Limitations:
In [ ]:
my_scalar = T.scalar(name='input',dtype='float64')
scalar_squared = T.sum(my_scalar**2)
#a derivative of v_squared by my_vector
derivative = T.grad(scalar_squared,my_scalar)
fun = theano.function([my_scalar],scalar_squared)
grad = theano.function([my_scalar],derivative)
In [ ]:
import matplotlib.pyplot as plt
%matplotlib inline
x = np.linspace(-3,3)
x_squared = map(fun,x)
x_squared_der = map(grad,x)
plt.plot(x, x_squared,label="x^2")
plt.plot(x, x_squared_der, label="derivative")
plt.legend()
In [ ]:
my_vector = T.vector('float64')
#Compute the gradient of the next weird function over my_scalar and my_vector
#warning! Trying to understand the meaning of that function may result in permanent brain damage
weird_psychotic_function = ((my_vector+my_scalar)**(1+T.var(my_vector)) +1./T.arcsinh(my_scalar)).mean()/(my_scalar**2 +1) + 0.01*T.sin(2*my_scalar**1.5)*(T.sum(my_vector)* my_scalar**2)*T.exp((my_scalar-4)**2)/(1+T.exp((my_scalar-4)**2))*(1.-(T.exp(-(my_scalar-4)**2))/(1+T.exp(-(my_scalar-4)**2)))**2
der_by_scalar,der_by_vector = <student.compute_grad_over_scalar_and_vector()>
compute_weird_function = theano.function([my_scalar,my_vector],weird_psychotic_function)
compute_der_by_scalar = theano.function([my_scalar,my_vector],der_by_scalar)
In [ ]:
#Plotting your derivative
vector_0 = [1,2,3]
scalar_space = np.linspace(0,7)
y = [compute_weird_function(x,vector_0) for x in scalar_space]
plt.plot(scalar_space,y,label='function')
y_der_by_scalar = [compute_der_by_scalar(x,vector_0) for x in scalar_space]
plt.plot(scalar_space,y_der_by_scalar,label='derivative')
plt.grid();plt.legend()
In [ ]:
# Multiply shared vector by a number and save the product back into shared vector
inputs = [input_scalar]
outputs = [scalar_times_shared] #return vector times scalar
my_updates = {
shared_vector_1:scalar_times_shared #and write this same result bach into shared_vector_1
}
compute_and_save = theano.function(inputs, outputs, updates=my_updates)
In [ ]:
shared_vector_1.set_value(np.arange(5))
#initial shared_vector_1
print "initial shared value:" ,shared_vector_1.get_value()
# evaluating the function (shared_vector_1 will be changed)
print "compute_and_save(2) returns",compute_and_save(2)
#evaluate new shared_vector_1
print "new shared value:" ,shared_vector_1.get_value()
[ 4 points max]
Implement the regular logistic regression training algorithm
Tips:
We shall train on a two-class MNIST dataset
In [ ]:
from sklearn.datasets import load_digits
mnist = load_digits(2)
X,y = mnist.data, mnist.target
print "y [shape - %s]:"%(str(y.shape)),y[:10]
print "X [shape - %s]:"%(str(X.shape))
print X[:3]
print y[:10]
In [ ]:
# inputs and shareds
shared_weights = <student.code_me()>
input_X = <student.code_me()>
input_y = <student.code_me()>
In [ ]:
predicted_y = <predicted probabilities for input_X>
loss = <logistic loss (scalar, mean over sample)>
grad = <gradient of loss over model weights>
updates = {
shared_weights: <new weights after gradient step>
}
In [ ]:
train_function = <compile function that takes X and y, returns log loss and updates weights>
predict_function = <compile function that takes X and computes probabilities of y>
In [ ]:
from sklearn.cross_validation import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y)
In [ ]:
from sklearn.metrics import roc_auc_score
for i in range(5):
loss_i = train_function(X_train,y_train)
print "loss at iter %i:%.4f"%(i,loss_i)
print "train auc:",roc_auc_score(y_train,predict_function(X_train))
print "test auc:",roc_auc_score(y_test,predict_function(X_test))
print "resulting weights:"
plt.imshow(shared_weights.get_value().reshape(8,-1))
plt.colorbar()
[basic part 4 points max] Your ultimate task for this week is to build your first neural network [almost] from scratch and pure theano.
This time you will same digit recognition problem, but at a larger scale
Note that you are not required to build 152-layer monsters here. A 2-layer (one hidden, one output) NN should already have ive you an edge over logistic regression.
[bonus score] If you've already beaten logistic regression with a two-layer net, but enthusiasm still ain't gone, you can try improving the test accuracy even further! The milestones would be 95%/97.5%/98.5% accuraсy on test set.
SPOILER! At the end of the notebook you will find a few tips and frequently made mistakes. If you feel enough might to shoot yourself in the foot without external assistance, we encourage you to do so, but if you encounter any unsurpassable issues, please do look there before mailing us.
In [ ]:
from mnist import load_dataset
#[down]loading the original MNIST dataset.
#Please note that you should only train your NN on _train sample,
# _val can be used to evaluate out-of-sample error, compare models or perform early-stopping
# _test should be hidden under a rock untill final evaluation... But we both know it is near impossible to catch you evaluating on it.
X_train,y_train,X_val,y_val,X_test,y_test = load_dataset()
print X_train.shape,y_train.shape
In [ ]:
plt.imshow(X_train[0,0])
In [ ]:
<here you could just as well create computation graph>
In [ ]:
<this may or may not be a good place to evaluating loss and updates>
In [ ]:
<here one could compile all the required functions>
In [ ]:
<this may be a perfect cell to write a training&evaluation loop in>
In [ ]:
<predict & evaluate on test here, right? No cheating pls.>
Recommended pipeline
Add a hidden layer. Now your logistic regression uses hidden neurons instead of inputs.
Now's the time to try improving the network. Consider layers (size, neuron count), nonlinearities, optimization methods, initialization - whatever you want, but please avoid convolutions for now.
In [ ]: