Introduction to Neural Networks with Low Level TensorFlow 2

Based on


In [0]:
!pip install -q tf-nightly-gpu-2.0-preview

In [2]:
import tensorflow as tf
print(tf.__version__)


2.0.0-dev20190513

In [3]:
# a small sanity check, does tf seem to work ok?
hello = tf.constant('Hello TF!')
print("This works: {}".format(hello))


This works: b'Hello TF!'

In [4]:
# this should return True even on Colab
tf.test.is_gpu_available()


Out[4]:
True

In [5]:
tf.test.is_built_with_cuda()


Out[5]:
True

In [6]:
!nvidia-smi


Mon May 13 14:00:55 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   66C    P0    31W /  70W |    129MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

In [7]:
tf.executing_eagerly()


Out[7]:
True

Transforming an input to a known output


In [0]:
input = [[-1], [0], [1], [2], [3], [4]]
output = [[2], [1], [0], [-1], [-2], [-3]]

In [9]:
import matplotlib.pyplot as plt

plt.xlabel('input')
plt.ylabel('output')

plt.plot(input, output, 'ro')


Out[9]:
[<matplotlib.lines.Line2D at 0x7fac90185a20>]

relation between input and output is linear


In [10]:
plt.plot(input, output)
plt.plot(input, output, 'ro')


Out[10]:
[<matplotlib.lines.Line2D at 0x7fac9d77f320>]

Defining the model to train

untrained single unit (neuron) also outputs a line from same input, although another one

The Artificial Neuron: Foundation of Deep Neural Networks (simplified, more later)

  • a neuron takes a number of numerical inputs
  • multiplies each with a weight, sums up all weighted input and
  • adds bias (constant) to that sum
  • from this it creates a single numerical output
  • for one input (one dimension) this would be a description of a line
  • for more dimensions this describes a hyper plane that can serve as a decision boundary
  • this is typically expressed as a matrix multplication plus an addition

This can be expressed using a matrix multiplication


In [11]:
w = tf.constant([[1.5], [-2], [1]], dtype='float32')
x = tf.constant([[10, 6, 8]], dtype='float32')
b = tf.constant([6], dtype='float32')

y = tf.matmul(x, w) + b
print(y)


tf.Tensor([[17.]], shape=(1, 1), dtype=float32)

Defining a layer with a random number of neurons and inputs


In [0]:
from tensorflow.keras.layers import Layer

class LinearLayer(Layer):
  """y = w.x + b"""

  def __init__(self, units=1, input_dim=1):
      super(LinearLayer, self).__init__()
      w_init = tf.random_normal_initializer(stddev=2)
      self.w = tf.Variable(
          initial_value = w_init(shape=(input_dim, units), dtype='float32'),
          trainable=True)
      b_init = tf.zeros_initializer()
      self.b = tf.Variable(
          initial_value = b_init(shape=(units,), dtype='float32'),
          trainable=True)

  def call(self, inputs):
      return tf.matmul(inputs, self.w) + self.b
    
linear_layer = LinearLayer()

Output of a single untrained neuron


In [13]:
x = tf.constant(input, dtype=tf.float32)
y_true = tf.constant(output, dtype=tf.float32)
y_true


Out[13]:
<tf.Tensor: id=31, shape=(6, 1), dtype=float32, numpy=
array([[ 2.],
       [ 1.],
       [ 0.],
       [-1.],
       [-2.],
       [-3.]], dtype=float32)>

In [14]:
y_pred = linear_layer(x)
y_pred


Out[14]:
<tf.Tensor: id=37, shape=(6, 1), dtype=float32, numpy=
array([[ 2.3813493],
       [ 0.       ],
       [-2.3813493],
       [-4.7626987],
       [-7.1440477],
       [-9.525397 ]], dtype=float32)>

In [15]:
plt.plot(x, y_pred)
plt.plot(input, output, 'ro')


Out[15]:
[<matplotlib.lines.Line2D at 0x7fac9d778518>]

Loss - Mean Squared Error

Loss function is the prerequisite to training. We need an objective to optimize for. We calculate the difference between what we get as output and what we would like to get.

Mean Squared Error

$MSE = {\frac {1}{n}}\sum _{i=1}^{n}(Y_{i}-{\hat {Y_{i}}})^{2}$

https://en.wikipedia.org/wiki/Mean_squared_error


In [0]:
loss_fn = tf.losses.mean_squared_error
# loss_fn = tf.losses.mean_absolute_error

In [17]:
loss = loss_fn(y_true=tf.squeeze(y_true), y_pred=tf.squeeze(y_pred))
print(loss)


tf.Tensor(15.002698, shape=(), dtype=float32)

In [18]:
tf.keras.losses.mean_squared_error == tf.losses.mean_squared_error


Out[18]:
True

Minimize Loss by changing parameters of neuron

Move in parameter space in the direction of a descent

https://twitter.com/colindcarroll/status/1090266016259534848

Job of the optimizer

For this we need partial derivations

TensorFlow offers automatic differentiation: https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/GradientTape

  • tape will record operations for automatic differentiation
  • either by making it record explicily (watch) or
  • by declaring a varible to be trainable (which we did in the layer above)

In [19]:
# a simple example

# f(x) = x^2
# f'(x) = 2x
# x = 4
# f(4) = 16
# f'(4) = 8 (that's what we expect)
def tape_sample():
  x = tf.constant(4.0)
  # open a GradientTape
  with tf.GradientTape() as tape:
    tape.watch(x)
    y = x * x
  dy_dx = tape.gradient(y, x)
  print(dy_dx)
  
# just a function in order not to interfere with x on the global scope  
tape_sample()


tf.Tensor(8.0, shape=(), dtype=float32)

Training


In [20]:
linear_layer = LinearLayer()
linear_layer.w, linear_layer.b


Out[20]:
(<tf.Variable 'Variable:0' shape=(1, 1) dtype=float32, numpy=array([[-3.5483046]], dtype=float32)>,
 <tf.Variable 'Variable:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>)

In [21]:
linear_layer.trainable_weights


Out[21]:
[<tf.Variable 'Variable:0' shape=(1, 1) dtype=float32, numpy=array([[-3.5483046]], dtype=float32)>,
 <tf.Variable 'Variable:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>]

In [0]:
EPOCHS = 200
learning_rate = 1e-2

losses = []
weights = []
biases = []
weights_gradient = []
biases_gradient = []

for step in range(EPOCHS):
  with tf.GradientTape() as tape:

    # forward pass
    y_pred = linear_layer(x)

    # loss value for this batch
    loss = loss_fn(y_true=tf.squeeze(y_true), y_pred=tf.squeeze(y_pred))
    
  # just for logging
  losses.append(loss.numpy())
  weights.append(linear_layer.w.numpy()[0][0])
  biases.append(linear_layer.b.numpy()[0])

  # get gradients of weights wrt the loss
  gradients = tape.gradient(loss, linear_layer.trainable_weights)
  weights_gradient.append(gradients[0].numpy()[0][0])
  biases_gradient.append(gradients[1].numpy()[0])
  
  # backward pass, changing trainable weights
  linear_layer.w.assign_sub(learning_rate * gradients[0])
  linear_layer.b.assign_sub(learning_rate * gradients[1])

In [23]:
print(loss)


tf.Tensor(0.00023834866, shape=(), dtype=float32)

In [24]:
plt.xlabel('epochs')
plt.ylabel('loss')

# plt.yscale('log')

plt.plot(losses)


Out[24]:
[<matplotlib.lines.Line2D at 0x7fac6ea024e0>]

In [25]:
plt.figure(figsize=(20, 10))

plt.plot(weights)
plt.plot(biases)
plt.plot(weights_gradient)
plt.plot(biases_gradient)

plt.legend(['slope', 'offset', 'gradient slope', 'gradient offset'])


Out[25]:
<matplotlib.legend.Legend at 0x7fac6e96c940>

Line drawn by neuron after training


In [26]:
y_pred = linear_layer(x)
y_pred


Out[26]:
<tf.Tensor: id=12139, shape=(6, 1), dtype=float32, numpy=
array([[ 1.9732319 ],
       [ 0.97976017],
       [-0.01371157],
       [-1.0071833 ],
       [-2.0006552 ],
       [-2.9941268 ]], dtype=float32)>

In [27]:
plt.plot(x, y_pred)
plt.plot(input, output, 'ro')


Out[27]:
[<matplotlib.lines.Line2D at 0x7fac6ea02828>]

In [28]:
# single neuron and single input: one weight and one bias
# slope m ~ -1
# y-axis offset y0 ~ 1
# https://en.wikipedia.org/wiki/Linear_equation#Slope%E2%80%93intercept_form

linear_layer.trainable_weights


Out[28]:
[<tf.Variable 'Variable:0' shape=(1, 1) dtype=float32, numpy=array([[-0.99347174]], dtype=float32)>,
 <tf.Variable 'Variable:0' shape=(1,) dtype=float32, numpy=array([0.97976017], dtype=float32)>]

Prebuilt Optimizers do this job (but a bit more efficient and sohpisticated)


In [0]:
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-2)

In [0]:
EPOCHS = 500

losses = []

linear_layer = LinearLayer()

for step in range(EPOCHS):
  with tf.GradientTape() as tape:

    # Forward pass.
    y_pred = linear_layer(x)

    # Loss value for this batch.
    loss = loss_fn(y_true=tf.squeeze(y_true), y_pred=tf.squeeze(y_pred))
    
  losses.append(loss)
     
  # Get gradients of weights wrt the loss.
  gradients = tape.gradient(loss, linear_layer.trainable_weights)
  
  # Update the weights of our linear layer.
  optimizer.apply_gradients(zip(gradients, linear_layer.trainable_weights))

In [31]:
# plt.yscale('log')
plt.ylabel("loss")
plt.xlabel("epochs")

plt.plot(losses)


Out[31]:
[<matplotlib.lines.Line2D at 0x7fac6e8802e8>]

In [32]:
y_pred = linear_layer(x)
plt.plot(x, y_pred)
plt.plot(input, output, 'ro')
linear_layer.trainable_weights


Out[32]:
[<tf.Variable 'Variable:0' shape=(1, 1) dtype=float32, numpy=array([[-0.9979986]], dtype=float32)>,
 <tf.Variable 'Variable:0' shape=(1,) dtype=float32, numpy=array([0.99379504], dtype=float32)>]

More data points, more noisy


In [33]:
import numpy as np

a = -1
b = 1
n = 50

x = tf.constant(np.random.uniform(0, 1, n), dtype='float32')
y = tf.constant(a*x+b + 0.1 * np.random.normal(0, 1, n), dtype='float32')

plt.scatter(x, y)


Out[33]:
<matplotlib.collections.PathCollection at 0x7fac6e7b99b0>

In [0]:
x = tf.reshape(x, (n, 1))
y_true = tf.reshape(y, (n, 1))

In [35]:
linear_layer = LinearLayer()

a = linear_layer.w.numpy()[0][0]
b = linear_layer.b.numpy()[0]

def plot_line(a, b, x, y_true):
  fig, ax = plt.subplots()
  y_pred = a * x + b
  
  line = ax.plot(x, y_pred)
  ax.plot(x, y_true, 'ro')
  return fig, line

plot_line(a, b, x, y_true)


Out[35]:
(<Figure size 432x288 with 1 Axes>,
 [<matplotlib.lines.Line2D at 0x7fac6e798cc0>])

In [0]:
# the problem is a little bit harder, train for a little longer
EPOCHS = 2000

losses = []
lines = []

linear_layer = LinearLayer()

for step in range(EPOCHS):
  # Open a GradientTape.
  with tf.GradientTape() as tape:

    # Forward pass.
    y_pred = linear_layer(x)

    # Loss value for this batch.
    loss = loss_fn(y_true=tf.squeeze(y_true), y_pred=tf.squeeze(y_pred))
    
  losses.append(loss)
  
  a = linear_layer.w.numpy()[0][0]
  b = linear_layer.b.numpy()[0]
  lines.append((a, b))
     
  # Get gradients of weights wrt the loss.
  gradients = tape.gradient(loss, linear_layer.trainable_weights)
  
  # Update the weights of our linear layer.
  optimizer.apply_gradients(zip(gradients, linear_layer.trainable_weights))

In [37]:
print(loss)


tf.Tensor(0.012210889, shape=(), dtype=float32)

In [38]:
# plt.yscale('log')
plt.ylabel("loss")
plt.xlabel("epochs")

plt.plot(losses)


Out[38]:
[<matplotlib.lines.Line2D at 0x7fac6e7cd048>]

Lines model draws over time

Initial Step


In [39]:
a, b = lines[0]

plot_line(a, b, x, y_true)


Out[39]:
(<Figure size 432x288 with 1 Axes>,
 [<matplotlib.lines.Line2D at 0x7fac6e61d9e8>])

After 500 Steps


In [40]:
a, b = lines[500]

plot_line(a, b, x, y_true)


Out[40]:
(<Figure size 432x288 with 1 Axes>,
 [<matplotlib.lines.Line2D at 0x7fac6e56fd68>])

Final Step


In [41]:
a, b = lines[1999]

plot_line(a, b, x, y_true)


Out[41]:
(<Figure size 432x288 with 1 Axes>,
 [<matplotlib.lines.Line2D at 0x7fac6e547630>])

Understandinging the effect of activation functions

Typically, the output of a neuron is transformed using an activation function which compresses the output to a value between 0 and 1 (sigmoid), or between -1 and 1 (tanh) or sets all negative values to zero (relu).

Typical Activation Functions


In [42]:
import numpy as np

x = tf.reshape(tf.constant(np.arange(-1, 4, 0.1), dtype='float32'), (50, 1))
y_pred = linear_layer(x)

plt.figure(figsize=(20, 10))

plt.plot(x, y_pred)

y_pred_relu = tf.nn.relu(y_pred)
plt.plot(x, y_pred_relu)

y_pred_sigmoid = tf.nn.sigmoid(y_pred)
plt.plot(x, y_pred_sigmoid)

y_pred_tanh = tf.nn.tanh(y_pred)
plt.plot(x, y_pred_tanh)

plt.plot(input, output, 'ro')

plt.legend(['no activation', 'relu', 'sigmoid', 'tanh'])


Out[42]:
<matplotlib.legend.Legend at 0x7fac6e4a9eb8>

Logictic Regression

So far we were inferring a continous value for another, now we want to classify. Imagine we have a line that separates two categories in two dimensions.


In [43]:
from matplotlib.colors import ListedColormap

a = -1
b = 1
n = 100

# all points
X = np.random.uniform(0, 1, (n, 2))

# our line
line_x = np.random.uniform(0, 1, n)
line_y = a*line_x+b
plt.plot(line_x, line_y, 'r')

# below and above line
y = X[:, 1] > a*X[:, 0]+b
y = y.astype(int)

plt.xlabel("x1")
plt.ylabel("x2")

plt.scatter(X[:,0], X[:,1], c=y, cmap=ListedColormap(['#AA6666', '#6666AA']), marker='o', edgecolors='k')
y


Out[43]:
array([1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1,
       1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0,
       1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0,
       1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0,
       1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1])

We compress output between 0 and 1 using sigmoid to match y

  • everything below 0.5 counts as 0, everthing above as 1

In [0]:
class SigmoidLayer(LinearLayer):
  """y = sigmoid(w.x + b)"""

  def __init__(self, **kwargs):
      super(SigmoidLayer, self).__init__(**kwargs)

  def call(self, inputs):
      return tf.sigmoid(super().call(inputs))

We have 2d input now


In [45]:
x = tf.constant(X, dtype='float32')
y_true = tf.constant(y, dtype='float32')
x.shape


Out[45]:
TensorShape([100, 2])

In [0]:
model = SigmoidLayer(input_dim=2)

Reconsidering the loss function

cross entropy is an alternative to squared error

  • cross entropy can be used as an error measure when a network's outputs can be thought of as representing independent hypotheses
  • activations can be understood as representing the probability that each hypothesis might be true
  • the loss indicates the distance between what the network believes this distribution should be, and what the teacher says it should be

http://www.cse.unsw.edu.au/~billw/cs9444/crossentropy.html


In [0]:
loss_fn = tf.losses.binary_crossentropy

In [0]:
# standard optimizer using advanced properties
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-1)

In [0]:
# https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/metrics/Accuracy
m = tf.keras.metrics.Accuracy()

In [0]:
EPOCHS = 1000

losses = []
accuracies = []

for step in range(EPOCHS):
  # Open a GradientTape.
  with tf.GradientTape() as tape:

    # Forward pass.
    y_pred = model(x)

    # Loss value for this batch.
    loss = loss_fn(y_true=tf.squeeze(y_true), y_pred=tf.squeeze(y_pred))

  y_pred_binary = (tf.squeeze(y_pred) > 0.5).numpy().astype(float)
  m.update_state(tf.squeeze(y_true), y_pred_binary)
  accuracy = m.result().numpy()

  losses.append(loss)
  accuracies.append(accuracy)
     
  # Get gradients of weights wrt the loss.
  gradients = tape.gradient(loss, model.trainable_weights)
  
  # Update the weights of our linear layer.
  optimizer.apply_gradients(zip(gradients, model.trainable_weights))

In [51]:
print(loss)


tf.Tensor(0.049787622, shape=(), dtype=float32)

In [52]:
print(accuracy)


0.98066

In [53]:
plt.yscale('log')
plt.ylabel("loss")
plt.xlabel("epochs")

plt.plot(losses)


Out[53]:
[<matplotlib.lines.Line2D at 0x7fac6e6e6d68>]

In [54]:
plt.ylabel("accuracy")
plt.xlabel("epochs")

plt.plot(accuracies)


Out[54]:
[<matplotlib.lines.Line2D at 0x7fac6e353358>]

In [55]:
y_pred = model(x)
y_pred_binary = (tf.squeeze(y_pred) > 0.5).numpy().astype(float)
y_pred_binary


Out[55]:
array([1., 1., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0., 1., 0., 0.,
       0., 1., 1., 1., 1., 1., 0., 0., 1., 0., 1., 1., 1., 0., 1., 1., 0.,
       0., 1., 1., 0., 1., 0., 1., 0., 1., 0., 1., 1., 0., 1., 1., 1., 0.,
       1., 1., 1., 1., 0., 1., 0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 1.,
       0., 0., 0., 1., 0., 1., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1.,
       1., 0., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 1.])

In [56]:
y_true - y_pred_binary


Out[56]:
<tf.Tensor: id=281144, shape=(100,), dtype=float32, numpy=
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
      dtype=float32)>

In [57]:
# below and above line

plt.xlabel("x1")
plt.ylabel("x2")

plt.scatter(X[:,0], X[:,1], c=y_pred_binary, cmap=ListedColormap(['#AA6666', '#6666AA']), marker='o', edgecolors='k')


Out[57]:
<matplotlib.collections.PathCollection at 0x7fac6e2b0da0>

The same solution using high level Keas API


In [58]:
from tensorflow.keras.layers import Dense
 
model = tf.keras.Sequential()

model.add(Dense(units=1, activation='sigmoid', input_dim=2))

model.summary()


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 1)                 3         
=================================================================
Total params: 3
Trainable params: 3
Non-trainable params: 0
_________________________________________________________________

In [59]:
%%time 

model.compile(loss=loss_fn, # binary cross entropy, unchanged from low level example
             optimizer=optimizer, # adam, unchanged from low level example
             metrics=['accuracy'])

# does a similar thing internally as our loop from above
history = model.fit(x, y_true, epochs=EPOCHS, verbose=0)


CPU times: user 10.1 s, sys: 795 ms, total: 10.9 s
Wall time: 8.69 s

In [60]:
loss, accuracy = model.evaluate(x, y_true)
loss, accuracy


100/100 [==============================] - 0s 422us/sample - loss: 0.0379 - accuracy: 1.0000
Out[60]:
(0.03793137133121491, 1.0)

In [61]:
plt.yscale('log')
plt.ylabel("accuracy")
plt.xlabel("epochs")

plt.plot(history.history['accuracy'])


Out[61]:
[<matplotlib.lines.Line2D at 0x7fac63e57780>]

In [62]:
plt.yscale('log')
plt.ylabel("loss")
plt.xlabel("epochs")

plt.plot(history.history['loss'])


Out[62]:
[<matplotlib.lines.Line2D at 0x7fac63dbbfd0>]

In [0]:
y_pred = model.predict(x)
y_pred_binary = (tf.squeeze(y_pred) > 0.5).numpy().astype(float)

In [64]:
# below and above line

plt.xlabel("x1")
plt.ylabel("x2")

plt.scatter(X[:,0], X[:,1], c=y_pred_binary, cmap=ListedColormap(['#AA6666', '#6666AA']), marker='o', edgecolors='k')


Out[64]:
<matplotlib.collections.PathCollection at 0x7fac63c83400>