In this seminar, we're going to play with Tensorflow and see how it helps us build deep learning models.
If you're running this notebook outside the course environment, you'll need to install tensorflow:
pip install tensorflow
should install cpu-only TF on Linux & Mac OS
In [1]:
import tensorflow as tf
gpu_options = tf.GPUOptions(allow_growth=True, per_process_gpu_memory_fraction=0.1)
s = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=gpu_options))
In [2]:
import numpy as np
def sum_squares(N):
v = np.arange(N)
return np.sum(v**2)
In [3]:
%%time
sum_squares(10**8)
Out[3]:
In [4]:
#I gonna be your function parameter
N = tf.placeholder('int64', name="input_to_your_function")
#i am a recipe on how to produce sum of squares of arange of N given N
result = tf.reduce_sum((tf.range(N)**2))
In [5]:
%%time
#example of computing the same as sum_squares
print(result.eval({N:10**8}))
Both can be int32/64, floats of booleans (uint8) of various size.
You can define new transformations as an arbitrary operation on placeholders and other transformations
a+b, a/b, a**b, ...
behave just like in numpyStill confused? We gonna fix that.
In [6]:
#Default placeholder that can be arbitrary float32 scalar, vector, matrix, etc.
arbitrary_input = tf.placeholder('float32')
#Input vector of arbitrary length
input_vector = tf.placeholder('float32',shape=(None,))
#Input vector that _must_ have 10 elements and integer type
fixed_vector = tf.placeholder('int32',shape=(10,))
#Matrix of arbitrary n_rows and 15 columns (e.g. a minibatch your data table)
input_matrix = tf.placeholder('float32',shape=(None,15))
#You can generally use None whenever you don't need a specific shape
input1 = tf.placeholder('float64',shape=(None,100,None))
input2 = tf.placeholder('int32',shape=(None,None,3,224,224))
In [7]:
#elementwise multiplication
double_the_vector = input_vector*2
#elementwise cosine
elementwise_cosine = tf.cos(input_vector)
#difference between squared vector and vector itself
vector_squares = input_vector**2 - input_vector
In [8]:
#Practice time: create two vectors of type float32
my_vector = tf.placeholder('float32', name="my_vector")
my_vector2 = tf.placeholder('float32', name="my_vector2")
In [9]:
#Write a transformation(recipe):
#(vec1)*(vec2) / (sin(vec1) +1)
my_transformation = my_vector * my_vector2 / (tf.math.sin(my_vector) + 1)
In [10]:
print(my_transformation)
#it's okay, it's a symbolic graph
In [11]:
#
dummy = np.arange(5).astype('float32')
my_transformation.eval({my_vector:dummy,my_vector2:dummy[::-1]})
Out[11]:
It's often useful to visualize the computation graph when debugging or optimizing. Interactive visualization is where tensorflow really shines as compared to other frameworks.
There's a special instrument for that, called Tensorboard. You can launch it from console:
tensorboard --logdir=/tmp/tboard --port=7007
If you're pathologically afraid of consoles, try this:
os.system("tensorboard --logdir=/tmp/tboard --port=7007 &"
(but don't tell anyone we taught you that)
In [12]:
# launch tensorflow the ugly way, uncomment if you need that
import os
#!killall tensorboard
#os.system("tensorboard --logdir=/tmp/tboard --port=7007 &")
# show graph to tensorboard
writer = tf.summary.FileWriter("/tmp/tboard", graph=tf.get_default_graph())
writer.close()
One basic functionality of tensorboard is drawing graphs. Once you've run the cell above, go to localhost:7007
in your browser and switch to graphs tab in the topbar.
Here's what you should see:
Tensorboard also allows you to draw graphs (e.g. learning curves), record images & audio and play flash games. This is useful when monitoring learning progress and catching some training issues.
One researcher said:
If you spent last four hours of your worktime watching as your algorithm prints numbers and draws figures, you're probably doing deep learning wrong.
You can read more on tensorboard usage here
In [13]:
# Quest #1 - implement a function that computes a mean squared error of two input vectors
# Your function has to take 2 vectors and return a single number
v1 = tf.placeholder('float32', name='v1')
v2 = tf.placeholder('float32', name='v2')
mse = tf.math.reduce_sum(tf.math.square(v1 - v2)) / tf.dtypes.cast(tf.size(v1), tf.float32)
compute_mse = lambda vector1, vector2: mse.eval({v1:vector1, v2:vector2})
In [14]:
# Tests
from sklearn.metrics import mean_squared_error
for n in [1,5,10,10**3]:
elems = [np.arange(n),np.arange(n,0,-1), np.zeros(n),
np.ones(n),np.random.random(n),np.random.randint(100,size=n)]
for el in elems:
for el_2 in elems:
true_mse = np.array(mean_squared_error(el,el_2))
my_mse = compute_mse(el,el_2)
if not np.allclose(true_mse,my_mse):
print('Wrong result:')
print('mse(%s,%s)' % (el,el_2))
print("should be: %f, but your function returned %f" % (true_mse,my_mse))
raise ValueError("Что-то не так")
print("All tests passed")
The inputs and transformations have no value outside function call. This isn't too comfortable if you want your model to have parameters (e.g. network weights) that are always present, but can change their value over time.
Tensorflow solves this with tf.Variable
objects.
s.run(...)
-ing
In [15]:
#creating shared variable
shared_vector_1 = tf.Variable(initial_value=np.ones(5))
In [16]:
#initialize variable(s) with initial values
s.run(tf.global_variables_initializer())
#evaluating shared variable (outside symbolic graph)
print("initial value", s.run(shared_vector_1))
# within symbolic graph you use them just as any other inout or transformation, not "get value" needed
In [17]:
#setting new value
s.run(shared_vector_1.assign(np.arange(5)))
#getting that new value
print("new value", s.run(shared_vector_1))
It can get you the derivative of any graph as long as it knows how to differentiate elementary operations
In [18]:
my_scalar = tf.placeholder('float32')
scalar_squared = my_scalar**2
#a derivative of scalar_squared by my_scalar
derivative = tf.gradients(scalar_squared, my_scalar)[0]
In [19]:
import matplotlib.pyplot as plt
%matplotlib inline
x = np.linspace(-3,3)
x_squared, x_squared_der = s.run([scalar_squared,derivative],
{my_scalar:x})
plt.plot(x, x_squared,label="x^2")
plt.plot(x, x_squared_der, label="derivative")
plt.legend();
In [20]:
my_vector = tf.placeholder('float32',[None])
#Compute the gradient of the next weird function over my_scalar and my_vector
#warning! Trying to understand the meaning of that function may result in permanent brain damage
weird_psychotic_function = tf.reduce_mean((my_vector+my_scalar)**(1+tf.nn.moments(my_vector,[0])[1]) + 1./ tf.atan(my_scalar))/(my_scalar**2 + 1) + 0.01*tf.sin(2*my_scalar**1.5)*(tf.reduce_sum(my_vector)* my_scalar**2)*tf.exp((my_scalar-4)**2)/(1+tf.exp((my_scalar-4)**2))*(1.-(tf.exp(-(my_scalar-4)**2))/(1+tf.exp(-(my_scalar-4)**2)))**2
der_by_scalar = tf.gradients(weird_psychotic_function, my_scalar)[0]
der_by_vector = tf.gradients(weird_psychotic_function, my_vector)[0]
In [21]:
#Plotting your derivative
scalar_space = np.linspace(1, 7, 100)
y = [s.run(weird_psychotic_function, {my_scalar:x, my_vector:[1, 2, 3]})
for x in scalar_space]
plt.plot(scalar_space, y, label='function')
y_der_by_scalar = [s.run(der_by_scalar, {my_scalar:x, my_vector:[1, 2, 3]})
for x in scalar_space]
plt.plot(scalar_space, y_der_by_scalar, label='derivative')
plt.grid()
plt.legend();
In [22]:
y_guess = tf.Variable(np.zeros(2,dtype='float32'))
y_true = tf.range(1,3,dtype='float32')
loss = tf.reduce_mean((y_guess - y_true + tf.random_normal([2]))**2)
optimizer = tf.train.MomentumOptimizer(0.01,0.9).minimize(loss,var_list=y_guess)
#same, but more detailed:
#updates = [[tf.gradients(loss,y_guess)[0], y_guess]]
#optimizer = tf.train.MomentumOptimizer(0.01,0.9).apply_gradients(updates)
In [23]:
from IPython.display import clear_output
s.run(tf.global_variables_initializer())
guesses = [s.run(y_guess)]
for _ in range(100):
s.run(optimizer)
guesses.append(s.run(y_guess))
clear_output(True)
plt.plot(*zip(*guesses),marker='.')
plt.scatter(*s.run(y_true),c='red')
plt.show()
Implement the regular logistic regression training algorithm
Tips:
train_function(X, y)
- returns error and computes weights' new values (through updates)predict_fun(X)
- just computes probabilities ("y") given dataWe shall train on a two-class MNIST dataset
y
are {0,1}
and not {-1,1}
as in some formulae
In [26]:
from sklearn.datasets import load_digits
mnist = load_digits(2)
X,y = mnist.data, mnist.target
print("y [shape - %s]:" % (str(y.shape)), y[:10])
print("X [shape - %s]:" % (str(X.shape)))
In [27]:
print('X:\n',X[:3,:10])
print('y:\n',y[:10])
plt.imshow(X[0].reshape([8,8]))
Out[27]:
In [29]:
# inputs and shareds
weights = tf.Variable(np.zeros(len(y), dtype='float32'))
input_X = tf.placeholder(tf.float32, shape=X.shape)
input_y = tf.placeholder(tf.float32, shape=y.shape)
In [ ]:
predicted_y = <predicted probabilities for input_X>
loss = <logistic loss (scalar, mean over sample)>
optimizer = <optimizer that minimizes loss>
In [ ]:
train_function = <compile function that takes X and y, returns log loss and updates weights>
predict_function = <compile function that takes X and computes probabilities of y>
In [ ]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)
In [ ]:
from sklearn.metrics import roc_auc_score
for i in range(5):
<run optimizer operation>
loss_i = <compute loss at iteration i>
print("loss at iter %i:%.4f" % (i, loss_i))
print("train auc:",roc_auc_score(y_train, predict_function(X_train)))
print("test auc:",roc_auc_score(y_test, predict_function(X_test)))
print ("resulting weights:")
plt.imshow(shared_weights.get_value().reshape(8, -1))
plt.colorbar();
Your ultimate task for this week is to build your first neural network [almost] from scratch and pure tensorflow.
This time you will same digit recognition problem, but at a larger scale
Note that you are not required to build 152-layer monsters here. A 2-layer (one hidden, one output) NN should already have ive you an edge over logistic regression.
[bonus score] If you've already beaten logistic regression with a two-layer net, but enthusiasm still ain't gone, you can try improving the test accuracy even further! The milestones would be 95%/97.5%/98.5% accuraсy on test set.
SPOILER! At the end of the notebook you will find a few tips and frequently made mistakes. If you feel enough might to shoot yourself in the foot without external assistance, we encourage you to do so, but if you encounter any unsurpassable issues, please do look there before mailing us.
In [ ]:
from mnist import load_dataset
#[down]loading the original MNIST dataset.
#Please note that you should only train your NN on _train sample,
# _val can be used to evaluate out-of-sample error, compare models or perform early-stopping
# _test should be hidden under a rock untill final evaluation... But we both know it is near impossible to catch you evaluating on it.
X_train,y_train,X_val,y_val,X_test,y_test = load_dataset()
print (X_train.shape,y_train.shape)
In [ ]:
plt.imshow(X_train[0,0])
In [ ]:
<here you could just as well create computation graph>
In [ ]:
<this may or may not be a good place to evaluating loss and optimizer>
In [ ]:
<this may be a perfect cell to write a training&evaluation loop in>
In [ ]:
<predict & evaluate on test here, right? No cheating pls.>
Recommended pipeline
Add a hidden layer. Now your logistic regression uses hidden neurons instead of inputs.
Now's the time to try improving the network. Consider layers (size, neuron count), nonlinearities, optimization methods, initialization - whatever you want, but please avoid convolutions for now.