tf.contrib.learn
tf.contrib.learn
is a simplified interface that provides readily available models. It was originally developed as Scikit Flow.
tf.contrib.learn
allows you:
all in (almost) one line. This is why it is called a one-liner module.
Here is the full code for the neural network classifier (Source: https://www.tensorflow.org/get_started/tflearn):
In [ ]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
import numpy as np
tf.logging.set_verbosity(tf.logging.ERROR)
# Data sets
# The Iris data set contains 150 rows of data, comprising 50 samples from each
# of three related Iris species: Iris setosa, Iris virginica, and Iris versicolor.
# Each row contains the following data for each flower sample: sepal length, sepal
# width, petal length, petal width, and flower species.
# Flower species are represented as integers, with 0 denoting Iris setosa,
# 1 denoting Iris versicolor, and 2 denoting Iris virginica.
IRIS_TRAINING = "data/iris_training.csv"
IRIS_TEST = "data/iris_test.csv"
# Load datasets.
# Datasets in tf.contrib.learn are named tuples; you can access feature data
# and target values via the data and target fields.
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(filename=IRIS_TRAINING,
target_dtype=np.int,
features_dtype=np.float,
target_column=-1)
test_set = tf.contrib.learn.datasets.base.load_csv_with_header(filename=IRIS_TEST,
target_dtype=np.int,
features_dtype=np.float,
target_column=-1)
# tf.contrib.learn offers a variety of predefined models, called Estimators,
# which you can use "out of the box" to run training and evaluation operations on
# your data.
# Here, we'll configure a Deep Neural Network Classifier model to fit
# the Iris data.
# Specify that all features have real-value data
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=4)]
# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20, 10],
n_classes=3,
model_dir="/tmp/iris_model",
enable_centered_bias = 'True')
# Fit model.
classifier.fit(x=training_set.data,
y=training_set.target,
steps=2000)
# Evaluate accuracy.
accuracy_score = classifier.evaluate(x=test_set.data,
y=test_set.target)["accuracy"]
print('Accuracy: {0:f}'.format(accuracy_score))
print('\n')
# Classify two new flower samples.
new_samples = np.array(
[[6.4, 3.2, 4.5, 1.5], [5.8, 3.1, 5.0, 1.7]], dtype=float)
y = list(classifier.predict(new_samples, as_iterable=True))
print('Predictions: {} {}'.format(str(y[0]), str(y[1])))
In [ ]:
import tensorflow.contrib.learn.python.learn as learn
# sklearn integration
from sklearn import datasets, metrics
iris = datasets.load_iris()
feature_columns = learn.infer_real_valued_columns_from_input(iris.data)
classifier = learn.LinearClassifier(n_classes=3, feature_columns=feature_columns)
classifier.fit(iris.data, iris.target, steps=200, batch_size=32)
iris_predictions = list(classifier.predict(iris.data, as_iterable=True))
score = metrics.accuracy_score(iris.target, iris_predictions)
print("Accuracy: %f" % score)
In [ ]:
import tensorflow.contrib.learn.python.learn as learn
from sklearn import datasets, metrics, preprocessing
boston = datasets.load_boston()
x = preprocessing.StandardScaler().fit_transform(boston.data)
feature_columns = learn.infer_real_valued_columns_from_input(x)
regressor = learn.LinearRegressor(feature_columns=feature_columns)
regressor.fit(x, boston.target, steps=200, batch_size=32)
boston_predictions = list(regressor.predict(x, as_iterable=True))
score = metrics.mean_squared_error(boston_predictions, boston.target)
print ("MSE: %f" % score)
In [ ]:
from sklearn import datasets
from sklearn import metrics
import tensorflow as tf
import tensorflow.contrib.layers.python.layers as layers
import tensorflow.contrib.learn.python.learn as learn
iris = datasets.load_iris()
def my_model(features, labels):
"""
DNN with three hidden layers.
"""
# Convert the labels to a one-hot tensor of shape (length of features, 3) and
# with a on-value of 1 for each one-hot vector of length 3.
labels = tf.one_hot(labels, 3, 1, 0)
# Create three fully connected layers respectively of size 10, 20, and 10.
features = layers.stack(features, layers.fully_connected, [10, 20, 10])
# Create two tensors respectively for prediction and loss.
prediction, loss = (
tf.contrib.learn.models.logistic_regression(features, labels)
)
# Create a tensor for training op.
train_op = tf.contrib.layers.optimize_loss(
loss, tf.contrib.framework.get_global_step(), optimizer='Adagrad',
learning_rate=0.1)
return {'class': tf.argmax(prediction, 1), 'prob': prediction}, loss, train_op
classifier = learn.Estimator(model_fn=my_model)
classifier.fit(iris.data, iris.target, steps=1000)
y_predicted = [p['class'] for p in classifier.predict(iris.data, as_iterable=True)]
score = metrics.accuracy_score(iris.target, y_predicted)
print('Accuracy: {0:f}'.format(score))
In [ ]:
labels = [0,1,3,1,1,0,2,2]
sess = tf.Session()
print(sess.run(tf.one_hot(labels, 4, 1, 0)))
sess.close()
The core data structure of Keras is a model, a way to organize layers. The main type of model is the Sequential model
, a linear stack of layers.
from keras.models import Sequential
model = Sequential()
Stacking layers is as easy as .add()
:
from keras.layers import Dense, Activation
model.add(Dense(output_dim=64, input_dim=100))
model.add(Activation("relu"))
model.add(Dense(output_dim=10))
model.add(Activation("softmax"))
Once your model looks good, configure its learning process with .compile()
:
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
If you need to, you can further configure your optimizer.
from keras.optimizers import SGD
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True))
You can now iterate on your training data in batches:
model.fit(X_train, Y_train, nb_epoch=5, batch_size=32)
Evaluate your performance in one line:
loss_and_metrics = model.evaluate(X_test, Y_test, batch_size=32)
Or generate predictions on new data:
classes = model.predict_classes(X_test, batch_size=32)
proba = model.predict_proba(X_test, batch_size=32)
In [ ]:
'''
Trains a simple deep NN on the MNIST dataset.
You can get to 98.40% test accuracy after 20 epochs.
'''
from __future__ import print_function
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
np.random.seed(1337) # for reproducibility
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import RMSprop
from keras.utils import np_utils
batch_size = 128
nb_classes = 10
nb_epoch = 10
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))
# print model characteristics
model.summary()
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
history = model.fit(X_train,
Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
verbose=1,
validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test, verbose=0)
print('\n')
print('Test score:', score[0])
print('Test accuracy:', score[1])
keras
sequential modeThe Sequential
model is a linear stack of layers.
You can create a Sequential
model by passing a list of layer instances to the constructor:
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential([
Dense(32, input_dim=784),
Activation('relu'),
Dense(10),
Activation('softmax'),
])
You can also simply add layers via the .add()
method:
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))
...
The model needs to know what input shape it should expect. For this reason, the first layer in a Sequentia
l model (and only the first, because following layers can do automatic shape inference) needs to receive information about its input shape. There are several possible ways to do this:
input_shape
argument to the first layer. This is a shape tuple (a tuple of integers or None
entries, where None
indicates that any positive integer may be expected). In input_shape
, the batch dimension is not included.batch_input_shape
argument, where the batch dimension is included. This is useful for specifying a fixed batch size (e.g. with stateful RNNs).Dense
, support the specification of their input shape via the argument input_dim
, and some 3D temporal layers support the arguments input_dim
and ``input_length```.As such, the following three snippets are strictly equivalent:
model = Sequential()
model.add(Dense(32, input_shape=(784,)))
model = Sequential()
model.add(Dense(32, batch_input_shape=(None, 784)))
# note that batch dimension is "None" here,
# so the model will be able to process batches of any size.
model = Sequential()
model.add(Dense(32, input_dim=784))
merge
layerMultiple Sequential instances can be merged into a single output via a Merge
layer. The output is a layer that can be added as first layer in a new Sequential
model. For instance, here's a model with two separate input branches getting merged:
from keras.layers import Merge
left_branch = Sequential()
left_branch.add(Dense(32, input_dim=784))
right_branch = Sequential()
right_branch.add(Dense(32, input_dim=784))
merged = Merge([left_branch, right_branch], mode='concat')
final_model = Sequential()
final_model.add(merged)
final_model.add(Dense(10, activation='softmax'))
Such a two-branch model can then be trained via e.g.:
final_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
final_model.fit([input_data_1, input_data_2], targets) # we pass one data array per model input
The Merge
layer supports a number of pre-defined modes:
sum
(default): element-wise sumconcat
: tensor concatenation. You can specify the concatenation axis via the argument concat_axis.mul
: element-wise multiplicationave
: tensor averagedot
: dot product. You can specify which axes to reduce along via the argument dot_axes.cos
: cosine proximity between vectors in 2D tensors.You can also pass a function as the mode argument, allowing for arbitrary transformations:
merged = Merge([left_branch, right_branch], mode=lambda x: x[0] - x[1])
Before training a model, you need to configure the learning process, which is done via the compile method. It receives three arguments:
rmsprop
or adagrad
), or an instance of the Optimizer
class. categorical_crossentropy
or mse
), or it can be an objective function. metrics=['accuracy']
. A metric could be the string identifier of an existing metric or a custom metric function. Custom metric function should return either a single tensor value or a dict metric_name
-> metric_value
. # for a multi-class classification problem
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
# for a binary classification problem
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
# for a mean squared error regression problem
model.compile(optimizer='rmsprop',
loss='mse')
In [ ]:
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential()
model.add(Dense(1, input_dim=784, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
# generate dummy data
import numpy as np
data = np.random.random((1000, 784))
labels = np.random.randint(2, size=(1000, 1))
# train the model, iterating on the data in batches
# of 32 samples
model.fit(data, labels, nb_epoch=10, batch_size=32)
model.summary()
For a multi-input model with 10 classes:
In [ ]:
from keras.layers import Merge
left_branch = Sequential()
left_branch.add(Dense(32, input_dim=784))
right_branch = Sequential()
right_branch.add(Dense(32, input_dim=784))
merged = Merge([left_branch, right_branch], mode='concat')
model = Sequential()
model.add(merged)
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
# generate dummy data
import numpy as np
from keras.utils.np_utils import to_categorical
data_1 = np.random.random((1000, 784))
data_2 = np.random.random((1000, 784))
# these are integers between 0 and 9
labels = np.random.randint(10, size=(1000, 1))
# we convert the labels to a binary matrix of size (1000, 10)
# for use with categorical_crossentropy
labels = to_categorical(labels, 10)
# train the model
# note that we are passing a list of Numpy arrays as training data
# since the model has 2 inputs
model.fit([data_1, data_2], labels, nb_epoch=10, batch_size=32)
model.summary()
The Keras functional API is the way to go for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers.
The Sequential
model is probably a better choice to implement such a network, but it helps to start with something really simple.
Using Model
class:
Input
tensor(s) and output tensor(s) can then be used to define a Model
from keras.layers import Input, Dense
from keras.models import Model
# this returns a tensor
inputs = Input(shape=(784,))
# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)
# this creates a model that includes
# the Input layer and three Dense layers
model = Model(input=inputs, output=predictions)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels) # starts training
All models are callable, just like layers.
With the functional API, it is easy to re-use trained models: you can treat any model as if it were a layer, by calling it on a tensor. Note that by calling a model you aren't just re-using the architecture of the model, you are also re-using its weights.
x = Input(shape=(784,))
# this works, and returns the 10-way softmax we defined above.
y = model(x)
(Source: https://github.com/fchollet/keras/blob/master/examples/mnist_siamese_graph.py)
Siamese networks are commonly used in image comparison applications such as face or signature verification. They can also be used in language processing, times series analysis, etc.
In a typical Siamese network a large part of the network is duplicated at the base to allow multiple inputs to go through identical layers.
This example shows how to teach a neural network to map an image from the MNIST dataset to a 2D point, while trying to minimize the distance between points of the same class and maximize the distance between points of different classes.
Siamese network architecture is a way of learning how to embed samples into lower-dimensions based on similarity computed with features learned by a feature network.
The feature network is the architecture we intend to fine-tune in this setting.
Let's suppose we want to embed images. Given two images $X_1$ and $X_2$, we feed into the feature network $G_W$ and compute corresponding feature vectors $G_W(X_1)$ and $G_W(X_2)$. The final layer computes pair-wise distance between computed features $E_W = || G_W(X_1) - G_W(X_2) ||_{1}$ and final loss layer $L$ considers whether these two images are from the same class (label $1$) or not (label $0$).
In the original paper it was proposed the Contrastive Loss Function:
$$ L(W,(Y,X_1,X_2)^i) = (1 - Y) \times L_S(E_W(X_1,X_2)^i) + Y \times L_D(E_W(X_1,X_2)^i) $$where $L_S$ is the partial loss function for a "same-class" pair and $L_D$ is the partial loss function for a "different-class" pair.
$L_S$ and $L_D$ should be designed in such a way that the minimization of $L$ will decrease the distance in the embedding space of "same-class" pairs and increase it in the case of "different-class" pairs:
$$ L_S = \frac{1}{2} E_W^2 $$$$ L_D = \frac{1}{2} \{ \mbox{max }(0,1-E_W) \}^2 $$
In [ ]:
'''Train a Siamese MLP on pairs of digits from the MNIST dataset.
It follows Hadsell-et-al.'06 [1] by computing the Euclidean distance on the
output of the shared network and by optimizing the contrastive loss (see paper
for mode details).
[1] "Dimensionality Reduction by Learning an Invariant Mapping"
http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
Gets to 99.5% test accuracy after 20 epochs.
3 seconds per epoch on a Titan X GPU
'''
from __future__ import absolute_import
from __future__ import print_function
import numpy as np
import tensorflow as tf
np.random.seed(1337) # for reproducibility
tf.reset_default_graph()
import random
from keras.datasets import mnist
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Input, Lambda
from keras.optimizers import RMSprop
from keras import backend as K
def euclidean_distance(vects):
x, y = vects
return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))
def eucl_dist_output_shape(shapes):
shape1, shape2 = shapes
return (shape1[0], 1)
def contrastive_loss(y_true, y_pred):
'''Contrastive loss from Hadsell-et-al.'06
http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
'''
margin = 1
return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))
def create_pairs(x, digit_indices):
'''Positive and negative pair creation.
Alternates between positive and negative pairs.
'''
pairs = []
labels = []
n = min([len(digit_indices[d]) for d in range(10)]) - 1
for d in range(10):
for i in range(n):
z1, z2 = digit_indices[d][i], digit_indices[d][i + 1]
pairs += [[x[z1], x[z2]]]
inc = random.randrange(1, 10)
dn = (d + inc) % 10
z1, z2 = digit_indices[d][i], digit_indices[dn][i]
pairs += [[x[z1], x[z2]]]
labels += [1, 0]
return np.array(pairs), np.array(labels)
def create_base_network(input_dim):
'''
Base network to be shared (eq. to feature extraction).
'''
seq = Sequential()
seq.add(Dense(128, input_shape=(input_dim,), activation='relu'))
seq.add(Dropout(0.1))
seq.add(Dense(128, activation='relu'))
seq.add(Dropout(0.1))
seq.add(Dense(2, activation=None,name='emb'))
return seq
def compute_accuracy(predictions, labels):
'''Compute classification accuracy with a fixed threshold on distances.
'''
return labels[predictions.ravel() < 0.5].mean()
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
input_dim = 784
nb_epoch = 10
# create training+test positive and negative pairs
digit_indices = [np.where(y_train == i)[0] for i in range(10)]
tr_pairs, tr_y = create_pairs(X_train, digit_indices)
digit_indices = [np.where(y_test == i)[0] for i in range(10)]
te_pairs, te_y = create_pairs(X_test, digit_indices)
# network definition
base_network = create_base_network(input_dim)
input_a = Input(shape=(input_dim,))
input_b = Input(shape=(input_dim,))
# because we re-use the same instance `base_network`,
# the weights of the network
# will be shared across the two branches
processed_a = base_network(input_a)
processed_b = base_network(input_b)
distance = Lambda(euclidean_distance, output_shape=eucl_dist_output_shape)([processed_a, processed_b])
model = Model(input=[input_a, input_b], output=distance)
# train
rms = RMSprop()
model.compile(loss=contrastive_loss, optimizer=rms)
model.fit([tr_pairs[:, 0], tr_pairs[:, 1]], tr_y,
validation_data=([te_pairs[:, 0], te_pairs[:, 1]], te_y),
batch_size=128,
nb_epoch=nb_epoch)
# compute final accuracy on training and test sets
pred = model.predict([tr_pairs[:, 0], tr_pairs[:, 1]])
tr_acc = compute_accuracy(pred, tr_y)
pred = model.predict([te_pairs[:, 0], te_pairs[:, 1]])
te_acc = compute_accuracy(pred, te_y)
print('* Accuracy on training set: %0.2f%%' % (100 * tr_acc))
print('* Accuracy on test set: %0.2f%%' % (100 * te_acc))
Here's a good use case for the functional API: models with multiple inputs and outputs. The functional API makes it easy to manipulate a large number of intertwined datastreams.
Let's consider the following model. We seek to predict how many retweets and likes a news headline will receive on Twitter. The main input to the model will be the headline itself, as a sequence of words, but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time of day when the headline was posted, etc.
The model will also be supervised via two loss functions. Using the main loss function earlier in a model is a good regularization mechanism for deep models.
Here's what our model looks like:
Let's implement it with the functional API.
The main input will receive the headline, as a sequence of integers (each integer encodes a word). The integers will be between 1 and 10,000 (a vocabulary of 10,000 words) and the sequences will be 100 words long.
In [ ]:
from keras.layers import Input, Embedding, LSTM, Dense, merge
from keras.models import Model
# headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')
# this embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)
# a LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)
Here we insert the auxiliary loss, allowing the LSTM and Embedding layer to be trained smoothly even though the main loss will be much higher in the model.
In [ ]:
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
At this point, we feed into the model our auxiliary input data by concatenating it with the LSTM output:
In [ ]:
auxiliary_input = Input(shape=(5,), name='aux_input')
x = merge([lstm_out, auxiliary_input], mode='concat')
# we stack a deep fully-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
# and finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
This defines a model with two inputs and two outputs:
In [ ]:
model = Model(input=[main_input, auxiliary_input], output=[main_output, auxiliary_output])
We compile the model and assign a weight of 0.2 to the auxiliary loss. To specify different loss_weights or loss for each different output, you can use a list or a dictionary. Here we pass a single loss as the loss argument, so the same loss will be used on all outputs.
In [ ]:
model.compile(optimizer='rmsprop', loss='binary_crossentropy',
loss_weights=[1., 0.2])
We can train the model by passing it lists of input arrays and target arrays:
model.fit([headline_data, additional_data], [labels, labels],
nb_epoch=50, batch_size=32)
Since our inputs and outputs are named (we passed them a "name" argument), We could also have compiled the model via:
model.compile(optimizer='rmsprop',
loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
loss_weights={'main_output': 1., 'aux_output': 0.2})
# and trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
{'main_output': labels, 'aux_output': labels},
nb_epoch=50, batch_size=32)
Another good use for the functional API are models that use shared layers. Let's take a look at shared layers.
Let's consider a dataset of tweets. We want to build a model that can tell whether two tweets are from the same person or not (this can allow us to compare users by the similarity of their tweets, for instance).
One way to achieve this is to build a model that encodes two tweets into two vectors, concatenates the vectors and adds a logistic regression of top, outputting a probability that the two tweets share the same author. The model would then be trained on positive tweet pairs and negative tweet pairs.
Because the problem is symmetric, the mechanism that encodes the first tweet should be reused (weights and all) to encode the second tweet. Here we use a shared LSTM layer to encode the tweets.
Let's build this with the functional API. We will take as input for a tweet a binary matrix of shape (140, 256), i.e. a sequence of 140 vectors of size 256, where each dimension in the 256-dimensional vector encodes the presence/absence of a character (out of an alphabet of 256 frequent characters).
In [ ]:
from keras.layers import Input, LSTM, Dense, merge
from keras.models import Model
tweet_a = Input(shape=(140, 256))
tweet_b = Input(shape=(140, 256))
To share a layer across different inputs, simply instantiate the layer once, then call it on as many inputs as you want:
In [ ]:
# this layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# when we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# we can then concatenate the two vectors:
merged_vector = merge([encoded_a, encoded_b], mode='concat', concat_axis=-1)
# and add a logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)
# we define a trainable model linking the
# tweet inputs to the predictions
model = Model(input=[tweet_a, tweet_b], output=predictions)
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit([data_a, data_b], labels, nb_epoch=10)