We now predict the phase transition using Scikit-learn's implementation of a multilayer perceptron neural network. This is one of the simpler versions of a neural network.
A neural network consists of layers of neurons or perceptrons. A neuron can take a continuous value where as a perceptron will be either "on" or "off". Scikit-learn's MLPClassifier
is called a multilayer perceptron model, but it uses neurons. A neuron accepts a vector of inputs $x$ and produces a scalar output $a_i$. A layer of neurons take in the a matrix $X$ and produces an activation vector $a$ consisting of each $a_i$ from each neuron in that layer.
where $w^{(l)}$ is that layers weights and $b^{(l)}$ is that layers biases. The activation $a^{(l)}$ is now computed by
\begin{align} a^{(l)} = f(z^{(l)}), \end{align}where $f(z)$ is an activation function. The first activation is given by $a^{(0)} = X$, where $X$ is the input data.
It is common to use the cross-entropy as the cost function for a categorical neural network. This is the same the one used in logistic regression.
The training of a neural network is a tricky affair as each neuron in each layer must be updated to minimize a cost function. The minimization process resembles that of other classifiers and regressors in that we use an optimization algorithm, e.g., stochastic gradient descent, to compute the change in the weights and biases in order to find a minimum of the cost function. The tricky part comes in computing the gradients of the weights and biases in the neural network. Due to an ingenious technique called backpropagation this can be done within reasonable time. The algorithm is listed below.
\sum_{k}\delta_k^{(l + 1)}\omega_{kj}^{(l + 1)}
\right)\frac{\mathrm{d}f(z_j^{(l)}}{\mathrm{d}z}.
\end{align}We will do a large search over a two-dimensional parameter grid for varying hidden layer sizes and learning rate.
An easier approach is to use libraries such as Tensorflow or Keras (which by default uses Tensorflow as a backend) to specify how the neural net should be. The libraries then build a neural net based on your specifications and lets you fit and predict on your very own construct. As a bonus, Tensorflow builds a graph from the executing code thus figuring out the most optimal way of executing the code and which parts of the code that can be done in parallel.
A perk of building a neural net this way is that we have a lot of freedom in how we build our neural net. For instance, we can set individual activation functions for each layer, decide the optimization technique to use, the cost function to minimize, which metric to evaluate our model on etc.
We now build a neural net from Keras with three hidden layers. The input and the hidden layer uses the activation function
\begin{align} f(z_j) = \tanh(z_j), \end{align}whereas the output layer uses the softmax activation function.
\begin{align} g(z_j) = \frac{e^{z_j}}{\sum_{i = 1}^{n} e^{z_i}}. \end{align}We avoid using the tangent hyperbolicus for the output layer as $f(z_j) \in [-1, 1]$ as opposed to the softmax function with $g(z_j) \in [0, 1]$. The latter more fits the categorical classification. Another perk of using the softmax activation function is that it better weighs all the classes in a multiclassification setting much as the partition function. We use stochastic gradient descent as our optimizer and the cross-entropy (shown in the notebook on logistic regression) as our cost function.
In [32]:
clf = km.Sequential()
clf.add(
kl.Dense(50, activation="tanh", input_dim=X_train.shape[1])
)
clf.add(
kl.Dense(100, activation="tanh")
)
clf.add(
kl.Dense(200, activation="tanh")
)
clf.add(
kl.Dense(2, activation="softmax")
)
clf.compile(
loss="binary_crossentropy",
optimizer=ko.SGD(
lr=0.01
),
metrics=["accuracy"]
)
We have to convert our integer labels to categorical values in a format that Keras accepts.
In [33]:
y_train_k = to_categorical(y_train[:, np.newaxis])
y_test_k = to_categorical(y_test[:, np.newaxis])
y_critical_k = to_categorical(labels[critical][:, np.newaxis])
We now fit our model for $10$ epochs and validate on the test data.
In [34]:
history = clf.fit(
X_train, y_train_k,
validation_data=(X_test, y_test_k),
epochs=10,
batch_size=200,
verbose=True
)
We now evaluate the model.
In [35]:
train_accuracy = clf.evaluate(X_train, y_train_k, batch_size=200)[1]
test_accuracy = clf.evaluate(X_test, y_test_k, batch_size=200)[1]
critical_accuracy = clf.evaluate(data[critical], y_critical_k, batch_size=200)[1]
print ("Accuracy on train data: {0}".format(train_accuracy))
print ("Accuracy on test data: {0}".format(test_accuracy))
print ("Accuracy on critical data: {0}".format(critical_accuracy))
Then we plot the ROC curve for three datasets.
In [36]:
fig = plt.figure(figsize=(20, 14))
for (_X, _y), label in zip(
[
(X_train, y_train_k),
(X_test, y_test_k),
(data[critical], y_critical_k)
],
["Train", "Test", "Critical"]
):
proba = clf.predict(_X)
fpr, tpr, _ = skm.roc_curve(_y[:, 1], proba[:, 1])
roc_auc = skm.auc(fpr, tpr)
print ("Keras AUC ({0}): {1}".format(label, roc_auc))
plt.plot(fpr, tpr, label="{0} (AUC = {1})".format(label, roc_auc), linewidth=4.0)
plt.plot([0, 1], [0, 1], "--", label="Guessing (AUC = 0.5)", linewidth=4.0)
plt.title(r"The ROC curve for Keras", fontsize=18)
plt.xlabel(r"False positive rate", fontsize=18)
plt.ylabel(r"True positive rate", fontsize=18)
plt.axis([-0.01, 1.01, -0.01, 1.01])
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.legend(loc="best", fontsize=18)
plt.show()
We see that we get much the same performance as for the MLPClassifier
from Scikit-learn.
We have successfully used a neural net using Keras and predicted the phase transition of the two-dimensional Ising model. As opposed to the logistic regression we get a much better performance on all datasets. More importantly we successfully predict reasonable values on the critical dataset which is extrapolated from the training and testing data.