1 Introduction

This tutorial introduces the basic features for simulating titratable systems via the constant pH method. The constant pH method is a Monte-Carlo method which models reactions by randomly adding a charged particle in the system and adding a charge of opposite sign on a neutral particle, or removing a charged particle and neutralizing a particle of the opposite charge.

We will consider a homogeneous aqueous solution of a titratable acidic species $\mathrm{HA}$ that can dissociate as follows $\mathrm{HA} \Leftrightarrow \mathrm{A}^- + \mathrm{H}^+$

If $N_0 = N_{\mathrm{HA}} + N_{\mathrm{A}^-}$ is the number of titratable groups in solution, then the degree of dissociation $\alpha$ can be defined: $\alpha = \dfrac{N_{\mathrm{A}^-}}{N_0}.$

1.1 Reaction Constants

The reaction constant describes the chemical equilibrium and therefore equilibrium concentrations:

\begin{equation} K = \frac{[\mathrm{H}^+][\mathrm{A}^-]}{[\mathrm{HA}]} \frac{\gamma_{\mathrm{H}^+}\gamma_{\mathrm{A}^-}}{\gamma_{\mathrm{HA}}}=\exp(-\beta \Delta G^0) \end{equation}

with $[\mathrm{H}^+]=\frac{c_{\mathrm{H}^+}}{c^0}$ the concencentration divided by the standard concentration (typically one mole per liter) and $\gamma_{\mathrm{H}^+}$ the activity coefficient of species $\mathrm{H}^+$. For an ideal system the activity coefficients are unity and $[\mathrm{H}^+] = 10^{-\mathrm{pH}}$. In an interacting system the activity coefficients of the species become unity at infinite dilution.

The degree of dissociation can also be expressed via a ratio of concentrations:

\begin{equation} \alpha = \frac{N_{\mathrm{A}^-}}{N_0} = \frac{N_{\mathrm{A}^-}}{N_{\mathrm{HA}} + N_{\mathrm{A}^-}} = \frac{[\mathrm{A}^-]}{[\mathrm{HA}]+[\mathrm{A}^-]}. \end{equation}

Multiplying both the numerator and denominator by $\dfrac{[\mathrm{H}^+]}{[\mathrm{HA}]}$ leads to:

\begin{equation} \alpha = \frac{\frac{[\mathrm{H}^+][\mathrm{A}^-]}{[\mathrm{HA}]}}{\frac{[\mathrm{H}^+][\mathrm{HA}]}{[\mathrm{HA}]}+\frac{[\mathrm{H}^+][\mathrm{A}^-]}{[\mathrm{HA}]}} = \frac{K}{[\mathrm{H}^+] + K}. \end{equation}

Since $K = 10^{-\mathrm{p}K}$ we can express $\alpha$ as a simple function of the pH and $\mathrm{p}K$:

\begin{equation} \alpha(\mathrm{pH}, \mathrm{p}K)= \frac{1}{\frac{[\mathrm{H}^+]}{K} + 1} = \frac{1}{\frac{10^{-\mathrm{pH}}}{10^{-\mathrm{p}K}} + 1} = \frac{1}{10^{\mathrm{p}K-\mathrm{pH}} + 1}. \end{equation}

1.2 Constant pH Method

In the constant pH method, the acceptance probability for a reaction is [1]:

$ P_{\mathrm{acc}} = \operatorname{min}\left(1, \exp(\beta \Delta E_\mathrm{pot} \pm \ln_{10} (\mathrm{pH - p}K) ) \right)$

and the proposal probability of a reaction is $P_\text{acc}=\frac{N_\mathrm{HA}}{N_0}$ for a dissociation and $P_\text{acc}=\frac{N_\mathrm{A}}{N_0}$ for an association reaction [1]. Here $\Delta E_\text{pot}$ is the potential energy change due to the reaction, while $\text{pH - p}K$ is an input parameter composed by two terms, pH and $-\mathrm{p}K$ (the logarithm of the acid dissociation constant $K$). The prefactor $\pm 1$ defines the direction of the reaction ($+1$ dissociation, $-1$ association). When a dissociation move is attempted, a titratable molecule $\mathrm{HA}$ is given a negative charge and a counterion $\mathrm{H}^+$ is randomly placed into the simulation box, while when an association move is attempted, a $\mathrm{A}^-$ is neutralized and a random counterion $\mathrm{H}^+$ is removed from the cell. Be aware that in the constant pH method the system is coupled to an implicit pH bath. That especially means that the number of $\mathrm{H}^+$ ions in the simulation box does not correspond to the chosen pH. This will lead to wrong screening effects when including electrostatic interactions in the system [1]. How to avoid these artifacts of the constant pH method, however, goes beyond the scope of this simple tutorial. We refer the reader to e.g. [1] for further details.

2 Setup

We start by creating a system instance with an arbitrary box length of 35 $\sigma$ and creating N0 titratable units (in the associated state)


In [ ]:
import numpy as np
import matplotlib.pyplot as plt
plt.ion()

import espressomd
from espressomd import reaction_ensemble

# System parameters
#############################################################
box_l = 35 # in units of sigma

# Integration parameters
#############################################################
system = espressomd.System(box_l=[box_l] * 3)
system.set_random_state_PRNG()
np.random.seed(seed=10)

type_HA = 0
type_A = 1
type_H = 2

N0 = 10  # number of titratable units in the box

for i in range(N0):
    system.part.add(pos=np.random.random(3) * system.box_l, type=type_HA)

Next we define some important constants and add particles to the system, like the dissociation constant of the acid. We choose $\mathrm{p}K=4$ which is close to the value for (poly)acrylic acid.


In [ ]:
K = 1e-4
pK = -np.log10(K)

The next step is to define the reaction system and the seed of the pseudo-random number generator which is used for the Monte-Carlo steps. Please note that the order in which species are written in the reactants and products lists is very important because, when a reaction is performed, the first species in the reactants list is replaced by the first species in the product lists, the second reactant species is replaced by the second product species, and so on. Moreover, if the reactant list has more species than the products list, reactant molecules in excess are deleted from the system, while if the products list has more species than the reactants list, product molecules in excess are created and randomly placed inside the simulation box. For example, reversing the order of products in our reaction (i.e. from product_types=[type_H, type_A] to product_types=[type_A, type_H]), a neutral monomer would be positively charged and a negatively charged monovalent counterion would be randomly placed inside the cell. If an interacting system is to be simulated the concept of an exclusion radius is important in order to ensure the sability of the molecular dynamics integrator: If there are diverging interaction potentials it may happen that a Monte Carlo Move is accepted which results in forces which are too big for the chosen timestep of the integrator. Therefore, those moves have to be prevented via the exclusion radius. Since we deal with a noninteracting system we set the exclusion radius to zero. We also assign charges to each type because in general the charge will play a role in simulations with electrostatic interactions. As an easy task for the interested reader we propose to adapt the tutorial to account for electrostatic interactions. Therefore we keep these values for the needed charge assignments in place, although they are not needed for an ideal system.


In [ ]:
RE = reaction_ensemble.ConstantpHEnsemble(
        temperature=1, exclusion_radius=0.0, seed=77)

RE.add_reaction(gamma=K, reactant_types=[type_HA], reactant_coefficients=[1],
                product_types=[type_A, type_H], product_coefficients=[1, 1],
                default_charges={type_HA: 0, type_A: -1, type_H: +1})
print(RE.get_status())

Next we perform simulations at different pH values. The system must be equilibrated at each pH before taking samples. Calling RE.reaction(X) attempts in total X reactions (in back and forward directions). We also plot the acceptance rate for the dissociation reaction and the association reaction for the first pH value which we set.


In [ ]:
num_samples = 100

pHs = np.linspace(1, 8, num=15)
degrees_of_dissociation = []
std_dev_degree_of_dissociation = []
histograms = []
histogram_edges = range(0,N0+1)

for pH in pHs:
    print("pH is {:.1f}".format(pH))
    RE.constant_pH = pH
    RE.reaction(4 * N0) # equilibration to the new pH
    numAs = []
    for i in range(num_samples):
        RE.reaction(N0 + 1)
        numAs.append(system.number_of_particles(type=type_A))
    degrees_of_dissociation.append(np.mean(numAs) / N0)
    std_dev_degree_of_dissociation.append(np.std(numAs, ddof=1) / N0)
    print("occurred particle numbers of type A ", sorted(set(numAs)))
    histogram = np.bincount(np.array(numAs).astype(int), minlength=N0 + 1)
    histograms.append(histogram / float(len(numAs)))
    if(abs(pH - 1.0) < 1e-4):
        print("acceptance rate dissociation", RE.get_acceptance_rate_reaction(0))
        print("acceptance rate association", RE.get_acceptance_rate_reaction(1))

3 Results

Finally we plot our results and compare it to the well-known result for an ideal reacting system $\alpha(\mathrm{pH}, \mathrm{p}K)$ presented above.

3.1 Statistical Uncertainty

To quantify the simulation uncertainty, we will add error bars to the calculated data points. Since the underlying probability distribution of our data does not follow a well-known function, such as the Gaussian, Binomial, exponential or Poisson distributions, there is no literature to provide us with an analytical form of the confidence interval. Sometimes the analytical form of the underlying probability distribution is also unknown. However, a statistic calculated from a random sample often has a probability distribution different from the sample probability distribution. We will use that to our advantage.

In our case, the values of $\alpha$ are drawn from a random distribution $X(N_{\mathrm{A}}, N_{\mathrm{0}})$ with parameters $N_{\mathrm{A}}$, $N_{\mathrm{0}}$ and moments $\mu_\alpha = \operatorname{E}(X)$ and $\sigma_\alpha = E([E(X)-X]^2)$. The average $\bar{\alpha} = \frac{1}{n}\sum_{i=1}^n \alpha_i$ and variance $\sigma_n^2 = \frac{1}{n}\sum_{i=1}^n (\alpha_i - \bar{\alpha})^2$ of a sample of $n$ independent values $\alpha_i$ are themselves random variables, but they do not have the same probability distribution as $X$. They are called statistics because they are calculated from a sample of the population, and can be used as estimators of the moments $\mu_\alpha$ and $\sigma_\alpha$.

According to the Central Limit Theorem (CLT), the random variable $\sqrt{n}(\bar{\alpha} - \mu_\alpha)$ has a probability distribution which converges for large sample sizes $n$ to the probability distribution $\mathcal{N}(0, \sigma_n^2)$. This statement can be rewritten as:

\begin{equation} z = \frac{\bar{\alpha} - \mu_\alpha}{\sigma_n} \sqrt{n} \sim \mathcal{N}(0,1) \end{equation}

in the limit of a large $n$, with $z$ the two-tail $z$-score of the standard Normal distribution. The rate of convergence depends on the underlying probability distribution $X$. For this tutorial, 15'000'000 data points were collected and the distribution of $z$ was very close to Normal for $n=100$.

The 95% probability of finding the population mean $\mu$ within two boundaries $w^+$ and $w^-$ over repeated experiments is given by:

\begin{equation} \operatorname{Pr}\left( w^- \leq \mu \leq w^+\right) = 0.95 \end{equation}

For the standard Normal distribution, by definition $\mu = 0$ and $w^\pm = \pm z_{1-0.95/2}$ with $z_{1-0.95/2} \approx 1.96$, so for the distribution $\sqrt{n}(\bar{\alpha} - \mu_\alpha) / \sigma_n$:

\begin{equation} w^\pm = \bar{\alpha} \pm z_{1-0.95/2} \cdot \frac{\sigma_n}{\sqrt{n}} \end{equation}

To plot the simulated curve with error bars, we need to estimate three parameters: $\mu_\alpha$ and $w^\pm$. The accuracy of our estimation will depend on the sample size num_samples. The larger it is, the more $\sqrt{n}(\bar{\alpha} - \mu_\alpha)$ converges to $\mathcal{N}(0, \sigma_n^2)$ and the smaller the $w^\pm$ interval gets.

The sample mean $\bar{\alpha}$ is already an unbiased estimator of $\mu_\alpha$. However to estimate $w^\pm$ we need to estimate $\sigma_N$, for which the sample standard deviation $s_n$ is a biased estimator. We can account for some of the bias using Bessel's correction:

\begin{equation} s_{n-1}^2 = \frac{1}{n-1}\sum_{i=1}^n (\alpha_i - \bar{\alpha})^2 \end{equation}

This correction is required because the sample variance $s_n^2$ is evaluated against $\bar{\alpha}$ instead of the population mean $\mu_\alpha$. $s_{n-1}^2$ is an unbiased estimator of $\sigma_n^2$, however taking the square root introduces a new source of bias due to Jensen's inequality. Different approaches exist to correct for this bias, but since it is small, it is common to neglect it. The quantity $w^\pm$ is estimated by a statistic called the 95% Confidence Interval:

\begin{equation} \mathrm{CI}_{95\%} = \bar{\alpha} \pm z_{1-0.95/2} \cdot \frac{s_{n-1}}{\sqrt{n}} \end{equation}

We will use that estimate to plot the error bars.


In [ ]:
import scipy.stats
def ideal_degree_of_dissociation(pH, pK):
    return 1. / (1 + 10**(pK - pH))

z = scipy.stats.norm.interval(1 - 0.05 / 2, loc=0)[1] # two-tail z-score at 95%
ci95 = z * np.array(std_dev_degree_of_dissociation) / np.sqrt(num_samples)
pK = -np.log10(K)
plt.figure(figsize=(10, 6), dpi=80)
plt.errorbar(pHs - pK, degrees_of_dissociation, ci95, label=r"simulation")
pHs2 = np.linspace(0, 8, num=50)
plt.plot(pHs2 - pK, ideal_degree_of_dissociation(pHs2, pK), label=r"ideal")
plt.xlabel('$pH-pK$', fontsize=20)
plt.ylabel(r'$\alpha$', fontsize=20)
plt.legend(fontsize=20)
plt.show()

We now compare the relative frequencies $h$ of observed numbers of particles of type A to the probabilities $p$ which are expected in the constant pH ensemble for a non-interacting system [1]:

\begin{equation} p(N_\mathrm{A})=\frac{{N_0 \choose N_\mathrm{A}} 10^{(\mathrm{pH-p}K)N_\mathrm{A}}} {\sum_{N=0}^{N_0} {N_0 \choose N} 10^{(\mathrm{pH}-\mathrm{p}K)N}} \end{equation}

In [ ]:
import scipy.special
def constant_pH_numA_pmf(numA, pH, pK, N0):
        all_NAs = range(N0+1)
        return scipy.special.binom(N0, numA) * 10.0**((pH - pK) * numA) / \
            sum(scipy.special.binom(N0, i) * 10**((pH - pK) * i) for i in all_NAs)

NAs = range(N0 + 1)

from IPython import display
import time

for index_pH in range(pHs.shape[0]):
    fig, ax = plt.subplots(1, 1)
    pH = pHs[index_pH]
    ax.plot(NAs, constant_pH_numA_pmf(NAs, pH, pK, N0), 'ro', ms=8, mec='r', label="Theory pH "+str(pH))
    ax.vlines(NAs, 0, constant_pH_numA_pmf(NAs, pH, pK, N0), colors='r', lw=4)
    ax.plot(histogram_edges, histograms[index_pH], 'bo', ms=8, mec='b',
            label="Simulation pH " + str(pH))
    ax.vlines(histogram_edges, 0, histograms[index_pH], colors='b', lw=1)
    ax.legend()
    ax.set_xlabel('$N_\\mathrm{A}}$')
    ax.set_ylabel('$p(N_\mathrm{A})$ or $h(N_\mathrm{A})$')
    display.clear_output(wait=True)
    display.display(plt.gcf())
    time.sleep(2.0)

References

[1] Landsgesell, Jonas, Christian Holm, and Jens Smiatek. Simulation of weak polyelectrolytes: a comparison between the constant pH and the reaction ensemble method. The European Physical Journal Special Topics 226.4 (2017): 725-736.