Keras introduction

Keras is a more high-level neural network library than TensorFlow and can use TensorFlow as a backend, and is supported by the TensorFlow developers. Keras features implementations of neural network layers (including dropout, batch normalization and pooling), objectives, activation functions, optimizers, and features support for convolutional and recurrent neural networks. Keras is perhaps one of the better options for rapid prototyping of deep learning algorithms.

Models in Keras are of two forms, Sequential and Functional API. The Sequential approach enables stacking of sequential and recurrent layers ordering from input to output while the Functional API approach enables more complicated arcitectures.

Keras features callbacks utilities which can be used to track variables during training. These can be used to create checkpoints at which models are saved while training in case of crashes and whatnot. A callback class, which is a class that inherits from keras.callbacks.Callback, is passed to the model fitting function and could be used to log something like the accuracy as training is progressing. More specifically, keras.callbacks.Callback has methods that can be overridden in a callback class definition; methods such as the following:

  • on_train_begin
  • on_epoch_end
  • on_batch_begin
  • on_batch_end

These are moments in training at which things can be done. Useful in overriding these methods is the logs dictionary which, by default, holds loss and accuracy during training.

units

Units refers to the number of vertices, nodes or neurons. It is a property of a layer and is related to the output shape.

shapes

Shapes are tuples that represent the number of elements an array or tensor has in each dimension. So, a tensor with 3 dimensions, containing 30 elements in the first dimension, 4 in the second, and 10 in the third, totaling 1200 elements, has shape (30, 4, 10).

input shape

The data that flows between layers are tensors, which can be considered as matrices of various shapes. In Keras, the input layer is not a layer, but a tensor. It is the input tensor that is sent to the first hidden layer. The specification of the shape of this tensor must correspond to the shape of the training data.

For example, a dataset of 30 images of dimensions 50 by 50 pixels each, in RGB color (3 channels) has the shape (30, 50, 50, 3). So, the input layer tensor must have this shape. The input shape is the only shape that must be specified explicitly. Keras calculates the other shapes depending on the layers automatically.

Generally, Keras ignores the first dimension, which is the batch size. The model should be able to deal with any batch size, so only the other dimensions need be defined:

input_shape = (50, 50, 3)

For some types of model the shape featuring the batch size can be specified for Keras via batch_input_shape = (30, 50, 50, 3) or batch_shape = (30, 50, 50, 3). This limits training to a unique batch size, so should be used only when required. Regardless, Keras will have the batch dimension, and will report things like the model summary in a way like the following:

(None, 50, 50, 3)

The first dimension here is the batch size and it is expressed as None because it can vary depending on how many examples are given to the model for training. If the batch size is specified explicitly to Keras, it will be reported in place of None.

If the input shape has only one dimension, it does not need to be specified as a tuple, but can be specified as a scaler number. So, for an input tensor with 3 elements, the following specifications are equivalent:

  • input_shape = (3,) (where the comma is necessary for only one dimension);
  • input_dim = 3.

output shape (relation between shapes and units)

Given an input shape, all other shapes are the results of layer calculations. Each type of layer works in a particular way. Dense layers have an output shape that is based on the number of nodes it has; convolutional layers have an output shape that is based on filters, but the output shape is always based on some layer property. A dense layer has the shape (batch_size, units).

weights

Weights are calculated automatically based on input and output shapes. Each type of layer works in a particular way, but the weights are ultimately some matrix capable of transforming the input shape into the output shape by some mathematical operation. In a dense layer, weights multiply all inputs. It is a weight matrix with one column per input and one row per unit.