The theanets
is a deep learning and neural network toolkit.
numpy
and scikit-learn
theano
is used to accelerate computations when possible using GPUThis command will install all the dependdeces of theanets
including numpy
and theano
.
You can also download package from https://github.com/lmjohns3/theanets and run the code from your local copy:
Three basin steps in theanets
:
theanets.Classifier
)theanets.Regressor
)theanets.Autoencoder
)theanets.reccurent
module)
In [ ]:
import theanets
# 1. create a model -- here, a regression model.
net = theanets.Regressor([10, 100, 2])
# optional: set up additional losses.
net.add_loss('mae', weight=0.1)
# 2. train the model.
net.train(
training_data,
validation_data,
algo='rmsprop',
hidden_l1=0.01, # apply a regularizer.
)
# 3. use the trained model.
net.predict(test_data)
In [ ]:
net = theanets.Regressor(layers=[10, 20, 3])
In general, layers
argument must be a sequence of values each of which specifies the configuration of a single layer in the model:
In [ ]:
net = theanets.Regressor([A, B, ..., Z])
size
: The number of “neurons” in the layer.form
: A string specifying the type of layer to use. This defaults to “feedforward”name
: A string name for the layer. The default names for the first and last layers - 'in' and 'out', the layers in between are assigned the name “hidN” where N is the number of existing layers.activation
: A string describing the activation function to use for the layer. This defaults to 'relu' - $max(0,z)$.Some possible values:
relu
: $g(z) = \max(0,z)$linear
: $g(z) = z$logistic
, sigmoid
: $g(z) = (1 + \exp(-z))^{-1}$tanh
: $g(z) = \tanh(z)$softmax
: $g(z) = \exp(z)/\sum_v(\exp(v))$norm:z
: $g(z) = (z - \bar{z})/\mathbb{E}[(z - \bar{z})^2]$Activation functions can also be composed by concatenating multiple function names togather using a +.
In [ ]:
net = theanets.Regressor([4, 5, 6, 2])
If there is a string in the tuple that names a registered layer type (e.g., 'tied'
, 'rnn'
, etc.), then this type of layer will be created.
If there is a string in the tuple and it does not name a registered layer type, the string is assumed to name an activation function—for example, 'logistic'
, 'relu+norm:z'
, and so on.
In [ ]:
net = theanets.Regressor([4, (5, 'sigmoid'), (6, 'softmax')])
In [ ]:
net = theanets.Regressor([10, (10, 'tanh+norm:z'), 10])
If a layer configuration value is a dictionary, its keyword arguments are passed directly to theanets.Layer.build()
to construct a new layer instance. The dictionary must contain a size
key. It can additionally contain any other keyword arguments that you wish to use when constructing the layer.
In [ ]:
net = theanets.Regressor([4, dict(size=5, activation='tanh'), 2])
In [ ]:
net = theanets.Regressor([4, dict(size=5, sparsity=0.9), 2])
sparsity
: A float giving the proportion of parameter values in the layer that should be initialized to zero. Nonzero values in the parameters will be drawn from a Gaussian distribution and then an appropriate number of these parameter values will randomly be reset to zero to make the parameter “sparse.”
In [ ]:
import theanets
import theano.tensor as TT
class NoBias(theanets.Layer):
# Transform the inputs for this layer into an output for the layer.
def transform(self, inputs):
return TT.dot(inputs, self.find('w'))
#Helper method to create a new weight matrix.
def setup(self):
self.add_weights('w', nin=self.input_size, nout=self.size)
layer = theanets.Layer.build('nobias', size=4)
net = theanets.Autoencoder(layers=[4, (3, 'nobias', 'linear'), (4, 'tied', 'linear')])
All of the predefined models in theanets
are created by default with one loss function appropriate for that type of model.
Autoencoder
: MSE between network's output and input
$$\mathcal{L}(X,\theta) = \frac{1}{mn}\sum_{i=1}^m\|F_\theta(x_i)-x_i\|_2^2, \quad X \in \mathbb{R}^{m\times n}$$Regressor
: MSE between true and predicted values of the target
$$\mathcal{L}(X, Y,\theta) = \frac{1}{mn}\sum_{i=1}^m\|F_\theta(x_i)-y_i\|_2^2, \quad X \in \mathbb{R}^{m\times n}, Y \in \mathbb{R}^{m\times o}$$Classifier
: Cross-entropy between the network output and the true target labels
$$\mathcal{L}(X, Y,\theta) = \frac{1}{m}\sum_{i=1}^m \sum_{j=1}^{k}\delta_{j, y_i}\log F_\theta(x_i)_j, \quad X \in \mathbb{R}^{m\times n}, Y \in \{1,...,k\}^m$$For example, to use a mean-absolute error instead of the default mean-squared error for a regression model:
In [ ]:
net = theanets.Regressor([4, 5, 2], loss='mae')
In [ ]:
net = theanets.Regressor([10, 20, 3])
net.add_loss('mae', weight=0.1)
You can specify the relative weight of the two losses by manipulating the weight
attribute of each loss instance. For instance, if you want the MAE loss to be twice as strong as the MSE loss:
In [ ]:
net.losses[1].weight = 2
In [ ]:
net = theanets.recurrent.Autoencoder([3, (10, 'rnn'), 3], weighted=True)
The training and validation datasets require an additional component: an array of floating-point values with the same shape as the expected output of the model, so that the training and validation datasets would each have three pieces: sample
, label
, and weight
. Each value in the weight array is used as the weight for the corresponding error when computing the loss.
In [ ]:
class Step(theanets.Loss):
def __call__(self, outputs):
step = outputs[self.output_name] > 0
if self._weights:
return (self._weights * step).sum() / self._weights.sum()
else:
return step.mean()
net = theanets.Regressor([5, 6, 7], loss='step', weighted=True)
Time is an explicit part of the model:
Recurrent versions of the three types of models:
theanets.recurrent.Autoencoder
: takes as input $X \in \mathbb{R}^{m\times t \times n}$ and recreates the same data at the output under squared-error losstheanets.recurrent.Regressor
: input data $X \in \mathbb{R}^{m\times t \times n}$ and output data $Y \in \mathbb{R}^{m\times t \times o}$, fit the output under squared-error losstheanets.recurrent.Classifier
: input data $X \in \mathbb{R}^{m\times t \times n}$ and set of integer labels $Y \in \mathbb{Z}^{m \times t}$, the default error is cross-enthropy
In [ ]:
class Autoencoder(theanets.Network):
def __init__(self, layers=(), loss='mse', weighted=False):
super(Autoencoder, self).__init__(
layers=layers, loss=loss, weighted=weighted)
In [ ]:
net = theanets.Classifier(layers=[10, 5, 2])
net.train(training_data,
validation_data,
algo='nag',
learning_rate=0.01,
momentum=0.9)
Here, a classifier model is being trained using Nesterov’s accelerated gradient, with a learning rate of 0.01 and momentum of 0.9.
Multiple calls to train()
are possible and can be used to implement things like custom annealing schedules (e.g., the “newbob” training strategy):
In [ ]:
net = theanets.Classifier(layers=[10, 5, 2])
for e in (-2, -3, -4):
net.train(training_data,
validation_data,
algo='nag',
learning_rate=10 ** e,
momentum=1 - 10 ** (e + 1))
In theanets
the most of the trainers are provided by downhill
package which provides algorithms for minimizing scalar loss functions that are defined using theano
.
sgd
: Stochastic gradient descentnag
: Nesterov’s accelerated gradientrprop
: Resilient backpropagationrmsprop
: RMSPropadadelta
: ADADELTAesgd
: Equilibrated SGDadam
: AdamAlso theanets
defines a few algorithms which are more specific to neural networks:
sampler
: This trainer sets model parameters directly to samples drawn from the training data. This is a very fast “training” algorithm since all updates take place at once; however, often features derived directly from the training data require further tuning to perform well.layerwise
: Greedy supervised layerwise pre-training: This trainer applies RMSProp to each layer sequentially.pretrain
: Greedy unsupervised layerwise pre-training: This trainer applies RMSProp to a tied-weights “shadow” autoencoder using an unlabeled dataset, and then transfers the learned autoencoder weights to the model being trained.There are two ways of passing data: using numpy
arrays and callables.
Instead of an array of data, you can provide a callable for a Dataset. This callable must take no arguments and must return a list of numpy arrays of the proper shape for your loss.
For example, this code defines a batch()
helper that could be used for a loss that needs one input. The callable chooses a random dataset and a random offset for each batch:
In [ ]:
SOURCES = 'foo.npy', 'bar.npy', 'baz.npy'
BATCH_SIZE = 64
def batch():
X = np.load(np.random.choice(SOURCES), mmap_mode='r')
i = np.random.randint(len(X))
return X[i:i+BATCH_SIZE]
net = theanets.Regressor(layers=[10, 5, 2])
net.train(train=batch, ...)
Regularizers in theanets are specified during training, in calls to Network.train()
, or during use, in calls to Network.predict()
.
In [ ]:
net.train(..., weight_l2=1e-4) # Decay
net.train(..., weight_l1=1e-4) # Sparcity
net.train(..., hidden_l1=0.1) # Hidden representations
net.train(..., input_noise=0.1) # Zero-mean Gaussian noise with std=0.1 added to the input
net.train(..., hidden_noise=0.1) # Zero-mean Gaussian noise with std=0.1 added to the hidden layers
net.train(..., input_dropout=0.3) # Binary noise with 0.3 probability of being set to zero added to the inputs
net.train(..., hidden_dropout=0.3) # Binary noise with 0.3 probability of being set to zero added
# to the hidden paraeters
In [ ]:
class WeightInverse(theanets.Regularizer):
def loss(self, layers, outputs):
return sum((1 / (p * p).sum(axis=0)).sum()
for l in layers for p in l.params
if p.ndim == 2)
net = theanets.Autoencoder([4, (8, 'linear'), (4, 'tied')])
net.train(..., weightinverse=0.001)
In [ ]:
for train, valid in net.itertrain(train_data, valid_data, **kwargs):
print('training loss:', train['loss'])
print('most recent validation loss:', valid['loss'])
The theanets.Network
base class can snapshot your model automatically during training. When you call theanets.Network.train()
, you can provide the following keyword arguments:
save_progress
: a string containing a filename where the model should be savedsave_every
: a numeric value specifying how often the model should be saved during trainingMannualy:
theanets.Network.save()
theanets.Network.load()
In [ ]:
results = net.predict(new_dataset)
Regardless of the model, you pass to predict()
a numpy
array containing data examples along the rows, and the method returns an array containing one row of output predictions for each row of input data.
You can also compute the activations of all layer outputs in the network using the theanets.Network.feed_forward()
method:
In [ ]:
for name, value in net.feed_forward(new_dataset).items():
print(abs(value).sum(axis=1))
This method returns a dictionary that maps layer output names to their corresponding values for the given input. Like predict()
, each output array contains one row for every row of input data.
The parameters in each layer of the model are available using theanets.Network.find()
. The first query term finds a layer in the network, and the second finds a parameter within that layer.
The find()
method returns a theano
shared variable. To get a numpy
array of the current values of the variable, call get_value()
on the result from find()
, like so:
In [ ]:
param = net.find('hid1', 'w')
values = param.get_value()
In [ ]:
theanets.Classifier((
784,
dict(size=100, name='a'),
dict(size=100, name='b'),
dict(size=100, name='c'),
dict(size=10, inputs=('a', 'b', 'c')),
))
In [1]:
import theanets
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
In [2]:
mnist = np.loadtxt("/home/natalia/ML/mnist_train.csv", delimiter=",", skiprows=1)
In [3]:
train_X = (mnist[:30000, 1:]/255.).astype('f')
train_y = mnist[:30000, 0].astype('i')
valid_X = (mnist[30000:, 1:]/255.).astype('f')
valid_y = mnist[30000:, 0].astype('i')
print(train_X.shape, train_y.shape, valid_X.shape, valid_y.shape)
Many extremely common dimensionality reduction techniques can be expressed as autoencoders. For instance, Principal Component Analysis (PCA) can be expressed as a model with two tied, linear layers:
In [4]:
pca = theanets.Autoencoder([784, (10, 'linear'), (784, 'tied')])
pca.train(train_X, valid_X)
Out[4]:
In [5]:
from utils import plot_images # from https://github.com/lmjohns3/theanets/tree/master/examples
v = valid_X[:100,:]
plt.figure(figsize = (10,10))
plot_images(v, 121, 'Sample data')
plt.tight_layout()
plot_images(pca.predict(v), 122, 'Reconstructed data')
plt.tight_layout()
plt.show()
In [20]:
net = theanets.Classifier(layers=[784, 100, 10])
In [6]:
train = [train_X, train_y]
valid = [valid_X, valid_y]
net.train(train, valid, algo='nag', learning_rate=1e-3, momentum=0.9)
Out[6]:
In this example, the weights in layer 1 connect the inputs to the first hidden layer; these weights have one column of 784 values for each hidden node in the network, so we can iterate over the transpose and put each column—properly reshaped—into a giant image.
In [7]:
img = np.zeros((28 * 10, 28 * 10), dtype='f')
plt.figure(figsize = (8,8))
for i, pix in enumerate(net.find('hid1', 'w').get_value().T):
r, c = divmod(i, 10)
img[r * 28:(r+1) * 28, c * 28:(c+1) * 28] = pix.reshape((28, 28))
plt.imshow(img, cmap=plt.cm.gray)
plt.show()
In [8]:
net.train(train, valid, algo='nag', learning_rate=1e-3, momentum=0.9, weight_l1=1e-4)
Out[8]:
In [26]:
for train, valid in net.itertrain([train_X, train_y], [valid_X, valid_y], algo='sgd',
learning_rate=1e-2, momentum=0.9, min_improvement=0.1, patience=1):
print('training loss:', train['loss'])
print('most recent validation loss:', valid['loss'])
In [ ]: