Predicting phase transition of the two-dimensional Ising model

In this notebook we will be doing a binary classification of the two-dimensional Ising model where we try to predict if a given state is ordered or disordered.

The Ising model

The Hamiltonian of the two-dimensional Ising model without an external field for a constant coupling constant $J$ is given by

\begin{align} H = -J \sum_{\langle ij\rangle} S_i S_j, \end{align}

where $S_i \in \{-1, 1\}$ and $\langle ij \rangle$ signifies that we only iterate over the nearest neighbors in the lattice. We will be looking at a system of $L = 40$ spins in each dimension, i.e., $L^2 = 1600$ spins in total. Opposed to the one-dimensional Ising model we will get a phase transition from an ordered phase to a disordered phase at the critical temperature

\begin{align} \frac{T_c}{J} = \frac{2}{\log\left(1 + \sqrt{2}\right)} \approx 2.26, \end{align}

as shown by Lars Onsager.

The goal

We wish in this notebook to use logistic regression to predict when a phase transition occurs. The data we will look at is a set of spin configurations, i.e., individual lattices with spins, labeled ordered 1 or disordered 0. Our job is to build a model which will take in a spin configuration and predict whether or not the spin configuration constitutes an ordered or a disordered phase. To achieve this we will represent the lattices as flattened arrays with $1600$ elements instead of a matrix of $40 \times 40$ elements. As an extra test of the performance of the algorithms we will divide the dataset into three pieces. We will do a conventional train-test-split on a combination of totally ordered and totally disordered phases. The remaining "critical-like" states will be used as test data which we hope the model will be able to make good extrapolated predictions on.



In [1]:

    
import pickle
import os
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn.model_selection as skms
import sklearn.linear_model as skl
import sklearn.metrics as skm
import tqdm
import copy
import time
from IPython.display import display

%matplotlib inline

sns.set(color_codes=True)



In [2]:

    
%%javascript
// Fill entire cell with output
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

Reading in the data

Using the data from here (specifically the two datasets named Ising2DFM_reSample_L40_T=All.pkl and Ising2DFM_reSample_L40_T=All_labels.pkl) we have to unpack the data into numpy arrays.



In [3]:

    
filenames = glob.glob(os.path.join("..", "dat", "*"))
label_filename = list(filter(lambda x: "label" in x, filenames))[0]
dat_filename = list(filter(lambda x: "label" not in x, filenames))[0]

# Read in the labels
with open(label_filename, "rb") as f:
    labels = pickle.load(f)

# Read in the corresponding configurations
with open(dat_filename, "rb") as f:
    data = np.unpackbits(pickle.load(f)).reshape(-1, 1600).astype("int")

# Set spin-down to -1
data[data == 0] = -1

This dataset consists of $10000$ samples, i.e., $10000$ spin configurations with $40 \times 40$ spins each, for $16$ temperatures between $0.25$ to $4.0$. Next we create a train/test-split and keep the data in the critical phase as a separate dataset for extrapolation-testing.



In [4]:

    
# Set up slices of the dataset
ordered = slice(0, 70000)
critical = slice(70000, 100000)
disordered = slice(100000, 160000)

X_train, X_test, y_train, y_test = skms.train_test_split(
    np.concatenate((data[ordered], data[disordered])),
    np.concatenate((labels[ordered], labels[disordered])),
    test_size=0.95
)

Note: Using a small training set yields a better accuracy. This will be discussed in the end.

Logistic regression

Logistic regression is a linear model for classification. Recalling the cost function for ordinary least squares with both L2 (ridge) and L1 (LASSO) penalties we will see that the logistic cost function is very similar. In OLS we wish to predict a continuous variable $\hat{y}$ using

\begin{align} \hat{y} = X\omega, \end{align}

where $X \in \mathbb{R}^{n \times p}$ is the input data and $\omega^{p \times d}$ are the weights of the regression. In a classification setting (binary classification in our situation) we are interested in a positive or negative answer. We can thus define either answer to be above or below some threshold. But, in order to limit the size of the answer and also to get a probability interpretation on how sure we are for either answer we can compute the sigmoid function of OLS. That is,

\begin{align} f(X\omega) = \frac{1}{1 + \exp(-X\omega)}. \end{align}

We are thus interested in minimizing the following cost function

\begin{align} C(X, \omega) = \sum_{i = 1}^n \left\{ - y_i\log\left( f(x_i^T\omega) \right) - (1 - y_i)\log\left[1 - f(x_i^T\omega)\right] \right\}, \end{align}

where we will restrict ourselves to a value for $f(z)$ as the sigmoid described above. We can also tack on a L2 (Ridge) or L1 (LASSO) penalization to this cost function in the same manner we did for linear regression.

Exploring the logistic regression

The penalization factor $\lambda$ is inverted in the case of the logistic regression model we use. We will explore several values of $\lambda$ using both L1 and L2 penalization. We do this using a grid search over different parameters and run a 3-fold cross validation for each configuration. In other words, we fit a model 3 times for each configuration of the hyper parameters.



In [5]:

    
lambdas = np.logspace(-7, -1, 7)

param_grid = {
    "C": list(1.0/lambdas),
    "penalty": ["l1", "l2"]
}



In [6]:

    
clf = skms.GridSearchCV(
    skl.LogisticRegression(),
    param_grid=param_grid,
    n_jobs=-1,
    return_train_score=True
)



In [7]:

    
t0 = time.time()
clf.fit(X_train, y_train)
t1 = time.time()

print (
    "Time spent fitting GridSearchCV(LogisticRegression): {0:.3f} sec".format(
        t1 - t0
    )
)









    



Time spent fitting GridSearchCV(LogisticRegression): 949.417 sec

We can see that logistic regression is quite slow and using the grid search and cross validation results in quite a heavy computation. Below we show the results of the different configurations.



In [8]:

    
logreg_df = pd.DataFrame(clf.cv_results_)

display(logreg_df)









    







  
    
      
      mean_fit_time
      std_fit_time
      mean_score_time
      std_score_time
      param_C
      param_penalty
      params
      split0_test_score
      split1_test_score
      split2_test_score
      mean_test_score
      std_test_score
      rank_test_score
      split0_train_score
      split1_train_score
      split2_train_score
      mean_train_score
      std_train_score
    
  
  
    
      0
      133.985549
      43.129200
      0.043782
      0.002575
      1e+07
      l1
      {'C': 10000000.0, 'penalty': 'l1'}
      0.694047
      0.709737
      0.700831
      0.701538
      0.006425
      12
      1.0
      1.000000
      1.0
      1.000000
      0.000000
    
    
      1
      51.797149
      7.266426
      0.058810
      0.008064
      1e+07
      l2
      {'C': 10000000.0, 'penalty': 'l2'}
      0.694509
      0.711583
      0.702216
      0.702769
      0.006982
      8
      1.0
      1.000000
      1.0
      1.000000
      0.000000
    
    
      2
      125.188562
      46.865003
      0.028435
      0.009611
      1e+06
      l1
      {'C': 1000000.0, 'penalty': 'l1'}
      0.693586
      0.714352
      0.700831
      0.702923
      0.008606
      6
      1.0
      1.000000
      1.0
      1.000000
      0.000000
    
    
      3
      46.434756
      6.818460
      0.047470
      0.021621
      1e+06
      l2
      {'C': 1000000.0, 'penalty': 'l2'}
      0.694509
      0.711121
      0.702216
      0.702615
      0.006789
      9
      1.0
      1.000000
      1.0
      1.000000
      0.000000
    
    
      4
      131.480284
      49.637990
      0.025367
      0.001607
      100000
      l1
      {'C': 99999.99999999999, 'penalty': 'l1'}
      0.693586
      0.711583
      0.700831
      0.702000
      0.007394
      11
      1.0
      1.000000
      1.0
      1.000000
      0.000000
    
    
      5
      48.442208
      4.821450
      0.025852
      0.000660
      100000
      l2
      {'C': 99999.99999999999, 'penalty': 'l2'}
      0.694509
      0.712506
      0.703139
      0.703385
      0.007350
      3
      1.0
      1.000000
      1.0
      1.000000
      0.000000
    
    
      6
      136.005517
      63.967902
      0.035429
      0.009921
      10000
      l1
      {'C': 10000.0, 'penalty': 'l1'}
      0.693586
      0.712967
      0.700831
      0.702462
      0.007997
      10
      1.0
      1.000000
      1.0
      1.000000
      0.000000
    
    
      7
      45.492904
      4.962405
      0.024428
      0.003608
      10000
      l2
      {'C': 10000.0, 'penalty': 'l2'}
      0.693586
      0.712506
      0.703601
      0.703231
      0.007729
      5
      1.0
      1.000000
      1.0
      1.000000
      0.000000
    
    
      8
      109.751415
      29.097978
      0.036205
      0.012117
      1000
      l1
      {'C': 1000.0, 'penalty': 'l1'}
      0.693586
      0.712044
      0.703139
      0.702923
      0.007538
      6
      1.0
      1.000000
      1.0
      1.000000
      0.000000
    
    
      9
      40.780339
      5.498792
      0.022993
      0.002149
      1000
      l2
      {'C': 1000.0, 'penalty': 'l2'}
      0.693124
      0.711583
      0.705448
      0.703385
      0.007676
      3
      1.0
      1.000000
      1.0
      1.000000
      0.000000
    
    
      10
      171.943614
      10.621141
      0.046334
      0.018095
      100
      l1
      {'C': 100.0, 'penalty': 'l1'}
      0.684818
      0.708814
      0.699908
      0.697846
      0.009905
      14
      1.0
      1.000000
      1.0
      1.000000
      0.000000
    
    
      11
      35.711317
      5.045432
      0.021141
      0.000833
      100
      l2
      {'C': 100.0, 'penalty': 'l2'}
      0.694509
      0.712506
      0.705448
      0.704154
      0.007405
      1
      1.0
      1.000000
      1.0
      1.000000
      0.000000
    
    
      12
      111.015806
      21.466425
      0.031371
      0.011843
      10
      l1
      {'C': 10.0, 'penalty': 'l1'}
      0.685741
      0.716198
      0.698523
      0.700154
      0.012488
      13
      1.0
      1.000000
      1.0
      1.000000
      0.000000
    
    
      13
      30.197917
      4.607725
      0.021942
      0.004698
      10
      l2
      {'C': 10.0, 'penalty': 'l2'}
      0.694047
      0.714352
      0.702216
      0.703538
      0.008342
      2
      1.0
      0.999769
      1.0
      0.999923
      0.000109

Accuracy of a classification model

To determine how well a classification model is performing we count the number of correctly labeled classes and divide by the number of classes in total. The accuracy is thus given by

\begin{align} a(y, \hat{y}) = \frac{1}{n}\sum_{i = 1}^{n} I(y_i = \hat{y}_i), \end{align}

where $I(y_i = \hat{y}_i)$ is the indicator function given by

\begin{align} I(x = y) = \begin{cases} 1 & x = y, \\ 0 & x \neq y. \end{cases} \end{align}

This is the accuracy provided by Scikit-learn when using sklearn.metrics.accuracy_score.

Below we compute the accuracy of the best fit model on the training data (which should give a good accuracy), the test data (which has not been shown to the model) and the critical data (completely new data that needs to be extrapolated).



In [9]:

    
train_accuracy = skm.accuracy_score(y_train, clf.predict(X_train))
test_accuracy = skm.accuracy_score(y_test, clf.predict(X_test))
critical_accuracy = skm.accuracy_score(labels[critical], clf.predict(data[critical]))

print ("Accuracy on train data: {0}".format(train_accuracy))
print ("Accuracy on test data: {0}".format(test_accuracy))
print ("Accuracy on critical data: {0}".format(critical_accuracy))









    



Accuracy on train data: 0.876
Accuracy on test data: 0.6896275303643725
Accuracy on critical data: 0.5445

We can see that we get quite good accuracy on the training data, but gradually worsening accuracy on the test and critical data.

Receiver operating characteristic

Below we show a different metric for determining the quality of our model, namely the reciever operating characteristic (ROC). The ROC curve tells us how well the model correctly classifies the different labels. We plot the true positive rate (the rate of predicted positive classes that are positive) versus the false positive rate (the rate of predicted positive classes that are negative). The ROC curve is built by computing the true positive rate and the false positive rate for varying thresholds, i.e, which probability we should acredit a certain class.

By computing the area under the curve (AUC) of the ROC curve we get an estimate of how well our model is performing. Pure guessing will get an AUC of $0.5$. A perfect score will get an AUC of $1.0$.



In [10]:

    
fig = plt.figure(figsize=(20, 14))

for (_X, _y), label in zip(
    [
        (X_train, y_train),
        (X_test, y_test),
        (data[critical], labels[critical])
    ],
    ["Train", "Test", "Critical"]
):
    proba = clf.predict_proba(_X)
    fpr, tpr, _ = skm.roc_curve(_y, proba[:, 1])
    roc_auc = skm.auc(fpr, tpr)

    print ("LogisticRegression AUC ({0}): {1}".format(label, roc_auc))

    plt.plot(fpr, tpr, label="{0} (AUC = {1})".format(label, roc_auc), linewidth=4.0)

plt.plot([0, 1], [0, 1], "--", label="Guessing (AUC = 0.5)", linewidth=4.0)

plt.title(r"The ROC curve for LogisticRegression", fontsize=18)
plt.xlabel(r"False positive rate", fontsize=18)
plt.ylabel(r"True positive rate", fontsize=18)
plt.axis([-0.01, 1.01, -0.01, 1.01])
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.legend(loc="best", fontsize=18)
plt.show()









    



LogisticRegression AUC (Train): 0.927466526230532
LogisticRegression AUC (Test): 0.4975344101021899
LogisticRegression AUC (Critical): 0.500251495

We can see that this plot of the ROC looks very strange. This tells us that logistic regression is quite inept at predicting the Ising model transition and is therefore highly non-linear. The ROC curve for the training data looks quite good, but as the testing data is so far off we see that we are dealing with an overfit model.

A previous run with $50\%$ of the data used for training yielded a worse performance than using a smaller training set. This again gives confidence to the fact that logistic regression is not able to correctly fit the Ising model as it is not a linear model.

	mean_fit_time	std_fit_time	mean_score_time	std_score_time	param_C	param_penalty	params	split0_test_score	split1_test_score	split2_test_score	mean_test_score	std_test_score	rank_test_score	split0_train_score	split1_train_score	split2_train_score	mean_train_score	std_train_score
0	133.985549	43.129200	0.043782	0.002575	1e+07	l1	{'C': 10000000.0, 'penalty': 'l1'}	0.694047	0.709737	0.700831	0.701538	0.006425	12	1.0	1.000000	1.0	1.000000	0.000000
1	51.797149	7.266426	0.058810	0.008064	1e+07	l2	{'C': 10000000.0, 'penalty': 'l2'}	0.694509	0.711583	0.702216	0.702769	0.006982	8	1.0	1.000000	1.0	1.000000	0.000000
2	125.188562	46.865003	0.028435	0.009611	1e+06	l1	{'C': 1000000.0, 'penalty': 'l1'}	0.693586	0.714352	0.700831	0.702923	0.008606	6	1.0	1.000000	1.0	1.000000	0.000000
3	46.434756	6.818460	0.047470	0.021621	1e+06	l2	{'C': 1000000.0, 'penalty': 'l2'}	0.694509	0.711121	0.702216	0.702615	0.006789	9	1.0	1.000000	1.0	1.000000	0.000000
4	131.480284	49.637990	0.025367	0.001607	100000	l1	{'C': 99999.99999999999, 'penalty': 'l1'}	0.693586	0.711583	0.700831	0.702000	0.007394	11	1.0	1.000000	1.0	1.000000	0.000000
5	48.442208	4.821450	0.025852	0.000660	100000	l2	{'C': 99999.99999999999, 'penalty': 'l2'}	0.694509	0.712506	0.703139	0.703385	0.007350	3	1.0	1.000000	1.0	1.000000	0.000000
6	136.005517	63.967902	0.035429	0.009921	10000	l1	{'C': 10000.0, 'penalty': 'l1'}	0.693586	0.712967	0.700831	0.702462	0.007997	10	1.0	1.000000	1.0	1.000000	0.000000
7	45.492904	4.962405	0.024428	0.003608	10000	l2	{'C': 10000.0, 'penalty': 'l2'}	0.693586	0.712506	0.703601	0.703231	0.007729	5	1.0	1.000000	1.0	1.000000	0.000000
8	109.751415	29.097978	0.036205	0.012117	1000	l1	{'C': 1000.0, 'penalty': 'l1'}	0.693586	0.712044	0.703139	0.702923	0.007538	6	1.0	1.000000	1.0	1.000000	0.000000
9	40.780339	5.498792	0.022993	0.002149	1000	l2	{'C': 1000.0, 'penalty': 'l2'}	0.693124	0.711583	0.705448	0.703385	0.007676	3	1.0	1.000000	1.0	1.000000	0.000000
10	171.943614	10.621141	0.046334	0.018095	100	l1	{'C': 100.0, 'penalty': 'l1'}	0.684818	0.708814	0.699908	0.697846	0.009905	14	1.0	1.000000	1.0	1.000000	0.000000
11	35.711317	5.045432	0.021141	0.000833	100	l2	{'C': 100.0, 'penalty': 'l2'}	0.694509	0.712506	0.705448	0.704154	0.007405	1	1.0	1.000000	1.0	1.000000	0.000000
12	111.015806	21.466425	0.031371	0.011843	10	l1	{'C': 10.0, 'penalty': 'l1'}	0.685741	0.716198	0.698523	0.700154	0.012488	13	1.0	1.000000	1.0	1.000000	0.000000
13	30.197917	4.607725	0.021942	0.004698	10	l2	{'C': 10.0, 'penalty': 'l2'}	0.694047	0.714352	0.702216	0.703538	0.008342	2	1.0	0.999769	1.0	0.999923	0.000109