A Study of Macromolecules in Electrolyte Solutions

Max Hebditch and Mikael Lund, August 2016

In this notebook we intend to investigate how binding ions distribute around ridig macromolecules such as proteins. It consists of the following sections:

i. Activity coefficients of simple electrolyte solutions. In this step we find ionic radii that best matched experimental activity coefficients to have realistic force field for ions in implicit solvent simulations. For example, sodium acetate will be a proxy for sodium binding to carboxylic groups; ammonium chloride, for chloride binding to lysine and so forth.

ii. Proton titration of proteins in 1:1 to determine the average protonation states as a function of pH.

iii. Simulation of salt particles around a protein with fixed, average charges as obtained in step (2). Here salt is treated grand canonically. In this part we use Gromacs to calculate spatial distribution functions from saved trajectories.

I. Mean Activity Coefficients in Bulk Solution

In this first part we estimate activity coefficients, $\gamma$, of salts in aqueous solution using Grand Canonical Metropolis Monte Carlo simulations. The particles are modelled as charged Lennard-Jones particles immersed in a dielectric continuum. By varying particle radii, experimentally obtained values are matched for 1:1 and 1:3 salts:

Salts Radius 1 (Å) Radius 2 (Å) Proxy for Exp. reference
NaCl 2.3 2.3 RS (Robinson and Stokes)
NaAc 2.3 2.95 glu, asp, ctr RS
NH4Cl 1.75 2.95 lys RS
GdnCl 0.55 2.95 arg Macaskill et al 1977
Na3Cit 2.3 3.5 Apelblat 2014

Note that the above values are for a combined Lennard-Jones/Coulomb potential and cannot be directly compared with previously radii fitted for the primitive model of electrolytes (hard spheres + Coulomb). See bottom of the page for more details. Further, the division into individual radii is arbitrary.


In [ ]:
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np, pandas as pd
from math import sqrt, pi, exp
import os.path, os, sys, json
try:
    workdir
except NameError:
    workdir=%pwd
else:
    %cd -q $workdir
print(workdir)

Download and compile the MC software

This will download the free Monte Carlo software Faunus; check out the version we used; and compile it (C++ compiler and CMake required)


In [ ]:
%%bash -s "$workdir"
%cd -q $1

echo 'fau_example(excess "./" excess.cpp)' > mc/CMakeLists.txt

if [ ! -d "faunus/" ]; then
    git clone https://github.com/mlund/faunus.git
    cd faunus
    git checkout master 7d022a78c67dd46c5ac32aa7f06e00b08459a1db
else
    cd faunus
fi

cmake . -DCMAKE_BUILD_TYPE=Release -DENABLE_APPROXMATH=on -DMYPLAYGROUND=$1/mc &>/dev/null
make excess -j4
%cd $1

Simulation setup

Wall time, number of cores etc. Currently for the slurm system.


In [ ]:
# definition of salts (n=stoichiometry; z=valency; r=radius; L=box length; activities in mol/l)
salts = pd.Series(
    {
        'NaCl' : pd.Series(
            dict(ion1='Na', ion2='Cl', n1=1, n2=1, z1=1, z2=-1, r1=2.2, r2=2.2, L=50,
                 activities=np.arange(0.1,1.6,0.05), exp='exp-nacl-coeff.csv',
                 color='red', label=u'NaCl' ) ),
        'Na3Cit' : pd.Series(
            dict(ion1='Na', ion2='Cit', n1=3, n2=1, z1=1, z2=-3, r1=2.3, r2=3.5, L=100,
                 activities=np.arange(0.005,0.1,0.005), exp='exp-na3cit-coeff.csv',
                 color='blue', label=u'Na$_3$(C$_6$H$_5$O$_7$)' ) ),
        'NaAc' : pd.Series(
            dict(ion1='Na', ion2='Ac', n1=1, n2=1, z1=1, z2=-1, r1=2.3, r2=2.95, L=50,
                 activities=np.arange(0.1,1.6,0.05), exp='exp-naac-coeff.csv',
                 color='green', label=u'Na(CH$_3$COO)' ) ),
        'NH4Cl' : pd.Series(
            dict(ion1='NH4', ion2='Cl', n1=1, n2=1, z1=1, z2=-1, r1=1.75, r2=2.3, L=50,
                 activities=np.arange(0.1,1.6,0.05), exp='exp-nh4cl-coeff.csv',
                 color='magenta', label=u'NH$_4$Cl' ) ),
        'GdnCl' : pd.Series(
            dict(ion1='Gdn', ion2='Cl', n1=1, n2=1, z1=1, z2=-1, r1=0.55, r2=2.3, L=50,
                 activities=np.arange(0.1,1.6,0.05), exp='exp-gdncl-coeff.csv',
                 color='black', label=u'GdnCl' ) )
    }
)

In [ ]:
%cd -q $workdir

def mkinput():
    js = {
            "moleculelist": {
                "salt": { "Ninit": 20, "atomic": True, "atoms": d.n1*(d.ion1+' ') + d.n2*(d.ion2+' ') }
            }, 
            "energy": {
                "nonbonded": { "coulomb": { "epsr": 80 } }
            }, 
            "moves": {
                "atomtranslate": { "salt": { "permol": True, "prob": 0.01 } }, 
                "atomgc": { "molecule": "salt" }
            }, 
            "system": {
                "mcloop"      : { "macro": 10, "micro": micro }, 
                "cuboid"      : { "len": d.L },
                "coulomb"     : { "epsr": 80 },
                "temperature" : 298.15
            },
            "analysis": {
                "pqrfile"   : { "file" : "confout.pqr" },
                "statefile" : { "file" : "state"}            
            },
            "atomlist": {
                d.ion1: { "q": d.z1, "r": d.r1, "eps":0.01, "dp": 50, "activity": d.n1*activity }, 
                d.ion2: { "q": d.z2, "r": d.r2, "eps":0.01, "dp": 50, "activity": d.n2*activity }
            }
    }

    with open('excess.json', 'w+') as f:
        f.write(json.dumps(js, indent=4))

# flow control:
equilibration = True   # if true, delete state file and start over
production    = True   # if true, start from present state file
override      = False  # if true, override existing files

for salt, d in salts.iteritems():
    for activity in d.activities:
        
        pfx=salt+'-a'+str(activity)
        if not os.path.isdir(pfx):
            %mkdir -p $pfx
        else:
            if override==False:
                break
        %cd $pfx

        # equilibration run
        if equilibration:
            !rm -fR state
            micro=5000
            mkinput()
            !../mc/excess > eq 2>&1
            !rm -fR analysis_out.json

        # production run
        if production:
            micro=50000
            mkinput()
            !../mc/excess > out 2>&1

        %cd -q ..

print('done.')

Analysis

The simulation output of the Grand Canonical move statistics contain information about the excess chemical potential and is stored in the file move_out.json. In the following we read this data and store it in a Pandas object.


In [ ]:
%cd -q $workdir
import json

for salt, d in salts.iteritems():
    d['g1'] = []
    d['g2'] = []
    for activity in d.activities:
        pfx=salt+'-a'+str(activity)

        if os.path.isdir(pfx):
            %cd -q $pfx
            r = json.load( open('move_out.json') )['moves']['Grand Canonical Salt']['atoms']
            d['g1'].append( r[d.ion1]['gamma'] )
            d['g2'].append( r[d.ion2]['gamma'] )
            %cd -q ..
salts.NaCl

Plot mean activity coefficients

For compound $M_pX_q$, the mean ionic activity coefficient is $ \gamma_{\pm} = \sqrt[q+p]{ \gamma_M^p \gamma_X^q}$. Note that the experimental data is in molality, i.e. moles salt per kilo solvent and should in principle be converted to the molarity scale as outlined in i.e. Robinson and Stokes classic book. We ignore this here as the intended use for the model is for sub-molar concentrations. All salt concentrations are converted to the ionic strength, $I=\frac{1}{2}\sum c_i z_i^2$ where $c$ are molar concentrations, $z$ valencies.


In [ ]:
%cd -q $workdir
def ionicstrength(c, n1, n2, z1, z2):
    return 0.5*( n1*c*z1**2 + n2*c*z2**2 )

def meanactivity(gA, gB, p, q):
    ''' mean ionic activity coefficient'''
    return ( gA**p * gB**q ) ** (1.0/(p+q))

for salt, d in salts.iteritems():
    # experiment (already tabulated as ionic strength)
    I, g = np.loadtxt(d.exp, delimiter=',', skiprows=1, unpack=True)
    plt.plot(I, g, marker='o', ls='none', ms=8, mew=0, color=d.color)
    
    # simulation
    g = meanactivity( np.array(d.g1), np.array(d.g2), d.n1, d.n2 )
    C = d.activities / g # molarity
    I = ionicstrength(C, d.n1, d.n2, d.z1, d.z2)
    
    plt.plot( I, g, label=d.label, lw=3, color=d.color)

plt.legend(loc=0, frameon=False, ncol=3)
plt.ylabel('Mean activity coefficient, $\gamma_{\pm}$')
plt.xlabel('Ionic strength (mol/l)')
plt.xlim((0.04,1.25))
plt.ylim((0,0.9))
plt.title('Experiment (symbols) vs simulation (lines)')
print('')

Pair potential between ions

The MC simulation uses a pair potential that is a mix of Lennard-Jones plus a Coulomb interaction with a high dielectric background. The particle radii are therefore not just the normal interpretation as in LJ as is illustrated for the distance of closest approach between a cation and anion below.


In [ ]:
from scipy.optimize import newton

lB=7           # Bjerrum length for water (Å)
s=2*2.2        # sigma (Å)
eps=0.01 / 2.5 # epsilon (kT)
r = np.linspace(0.64*s, 2*s, 100)

def el(r): return -lB/r                        # Coulomb pot.
def lj(r): return 4*eps*( (s/r)**12-(s/r)**6 ) # Lennard-Jones pot.
def u(r): return lj(r) + el(r)                 # combined pot.

plt.plot(r, lj(r), 'r.', label='LJ', markersize=4 )
plt.plot(r, el(r), label='Coulomb', lw=2)
plt.plot(r, u(r), label='LJ+Coulomb', lw=2 )
plt.xlabel(u'$r$/Å')
plt.ylabel(u'$u(r)/k_BT$')
plt.legend(loc=0, frameon=False)

r0 = newton(u, x0=0.5*s) # when is u(r) zero?
plt.plot([r0], [0], 'go')
plt.annotate( '$r_0=$'+str('%.1f' % r0)+u' Å', xy=(r0, 0), xytext=(r0+1, -0.8),
             arrowprops=dict(facecolor='green', width=2, headwidth=6, edgecolor='none', shrink=0.1) )

plt.plot([s], [0], 'ro')
plt.annotate( '$r_0=$'+str('%.1f' % s)+u' Å', xy=(s, 0), xytext=(s+1, 0.8),
             arrowprops=dict(facecolor='red', width=2, headwidth=6, edgecolor='none', shrink=0.1) )

II. pH Titration of a Single Macromolecule

We will now use the same model as before, but with a central, rigid molecule centered in the middle of the simulation box. Protons on acidic and basic residues fluctuate according to the specified pH and 1:1 salt is treated grand canonically.


In [ ]:
def mkinput():
    global micro, d, activity, macromolecule, pH
    js = {
            "moleculelist": {
                "_protein": { "structure":macromolecule, "Ninit":1, "insdir":"0 0 0" },
                "counter": { "Ninit": 8, "atomic": True, "atoms":"CNa" },
                "salt":    { "Ninit": 30, "atomic": True, "atoms": d.n1*(d.ion1+' ') + d.n2*(d.ion2+' ') }
            },
            "energy": {
                "nonbonded": { "coulomb": { "epsr": 80 } }
            },
            "moves": {
                "atomtranslate": {
                    "salt": { "peratom": True, "prob": 1.0 },
                    "counter": { "peratom": True, "prob": 1.0 }
                }
            },
            "system": {
                "mcloop"      : { "macro": 10, "micro": micro }, 
                "cuboid"      : { "len": d.L },
                "coulomb"     : { "epsr": 80 },
                "temperature" : 298.15
            },
            "atomlist": {
                d.ion1 :  { "q": d.z1, "r": d.r1, "eps":0.01, "dp": 50, "activity": d.n1*activity }, 
                d.ion2 :  { "q": d.z2, "r": d.r2, "eps":0.01, "dp": 50, "activity": d.n2*activity },
                "CNa"  :  { "q": d.z1, "r": d.r1, "eps":0.01, "dp": 50 },
                "ASP"  :  { "q":-1, "r":3.6, "mw":110, "eps":0.01 },
                "HASP" :  { "q":0,  "r":3.6, "mw":110, "eps":0.01 },
                "LASP" :  { "q":2,  "r":3.6, "mw":110, "eps":0.01 },
                "CTR"  :  { "q":-1, "r":2.0, "mw":16, "eps":0.01 },
                "HCTR" :  { "q":0,  "r":2.0, "mw":16, "eps":0.01 },
                "GLU"  :  { "q":-1, "r":3.8, "mw":122, "eps":0.01 },
                "HGLU" :  { "q":0,  "r":3.8, "mw":122, "eps":0.01 },
                "LGLU" :  { "q":2,  "r":3.8, "mw":122, "eps":0.01 },
                "HIS"  :  { "q":0,  "r":3.9, "mw":130, "eps":0.01 },
                "HHIS" :  { "q":1,  "r":3.9, "mw":130, "eps":0.01 },
                "NTR"  :  { "q":0,  "r":2.0, "mw":14, "eps":0.01 },
                "HNTR" :  { "q":1,  "r":2.0, "mw":14, "eps":0.01 },
                "TYR"  :  { "q":-1, "r":4.1, "mw":154, "eps":0.01 },
                "HTYR" :  { "q":0,  "r":4.1, "mw":154, "eps":0.01 },
                "LYS"  :  { "q":0,  "r":3.7, "mw":116, "eps":0.01 },
                "HLYS" :  { "q":1,  "r":3.7, "mw":116, "eps":0.01 },
                "CYb"  :  { "q":0,  "r":3.6, "mw":103, "eps":0.01 },
                "CYS"  :  { "q":-1, "r":3.6, "mw":103, "eps":0.01 },
                "HCYS" :  { "q":0,  "r":3.6, "mw":103, "eps":0.01 },
                "ARG"  :  { "q":0,  "r":4.0, "mw":144, "eps":0.01 },
                "HARG" :  { "q":1,  "r":4.0, "mw":144, "eps":0.01 },
                "ALA"  :  { "q":0,  "r":3.1, "mw":66, "eps":0.01 },
                "ILE"  :  { "q":0,  "r":3.6, "mw":102, "eps":0.01 },
                "LEU"  :  { "q":0,  "r":3.6, "mw":102, "eps":0.01 },
                "MET"  :  { "q":0,  "r":3.8, "mw":122, "eps":0.01 },
                "PHE"  :  { "q":0,  "r":3.9, "mw":138, "eps":0.01 },
                "PRO"  :  { "q":0,  "r":3.4, "mw":90, "eps":0.01 },
                "TRP"  :  { "q":0,  "r":4.3, "mw":176, "eps":0.01 },
                "VAL"  :  { "q":0,  "r":3.4, "mw":90, "eps":0.01 },
                "SER"  :  { "q":0,  "r":3.3, "mw":82, "eps":0.01 },
                "THR"  :  { "q":0,  "r":3.5, "mw":94, "eps":0.01 },
                "ASN"  :  { "q":0,  "r":3.6, "mw":108, "eps":0.01 },
                "GLN"  :  { "q":0,  "r":3.8, "mw":120, "eps":0.01 },
                "GLY"  :  { "q":0,  "r":2.9, "mw":54, "eps":0.01 }
            },
            "analysis": {
                "xtcfile"   : { "file" : "traj.xtc", "nstep":10 },
                "pqrfile"   : { "file" : "confout.pqr" },
                "aamfile"   : { "file" : "confout.aam" },
                "statefile" : { "file" : "state" }            
            }
    }
    
    if (TITRATION):
        js['moves']['gctit'] = {
            "molecule": "salt", "prob": 1.0,
            "processes" : {
                "H-Asp" : { "bound":"HASP", "free":"ASP", "pKd":4.0, "pX":pH },
                "H-Ctr" : { "bound":"HCTR", "free":"CTR", "pKd":2.6, "pX":pH },
                "H-Glu" : { "bound":"HGLU", "free":"GLU", "pKd":4.4, "pX":pH },
                "H-His" : { "bound":"HHIS", "free":"HIS", "pKd":6.3, "pX":pH },
                "H-Arg" : { "bound":"HARG", "free":"ARG", "pKd":12.0, "pX":pH },
                "H-Ntr" : { "bound":"HNTR", "free":"NTR", "pKd":7.5, "pX":pH },
                "H-Cys" : { "bound":"HCYS", "free":"CYS", "pKd":10.8, "pX":pH },
                "H-Tyr" : { "bound":"HTYR", "free":"TYR", "pKd":9.6, "pX":pH },
                "H-Lys" : { "bound":"HLYS", "free":"LYS", "pKd":10.4,"pX":pH }
            }
        }
    if (GC):
        js['moves']['atomgc'] = { "molecule": "salt" }

    print('Writing JSON file')
    with open('excess.json', 'w') as f:
        f.write(json.dumps(js, indent=4))
        
def WriteAAM( filename, aam ):
    f = open( filename,'w' )
    f.write( str(len(aam)) + '\n' )
    for i,j in aam.iterrows():
        f.write('{0:4} {1:5} {2:8.3f} {3:8.3f} {4:8.3f} {5:6.3f} {6:6.2f} {7:6.2f}\n'\
                .format(j['name'], i+1, j.x, j.y, j.z, j.q, j.mw, j.r))
    f.close()

def AverageChargeAAM(aamfile, gctit_output, group='_protein', scale2int=True):
    from math import floor
    ''' Save aam file w. average charges based on gctit-output (json file) '''
    aam = pd.read_table(aamfile, sep=' ', skiprows=1, usecols=[0,2,3,4,5,6,7], names=['name','x','y','z','q','mw','r'])

    js = json.load( open(gctit_output) ) [group]
    js = pd.DataFrame( js, index=js['index'] )

    f = 1.0
    if scale2int:
        total = js.charge.sum()
        target = round(total, 0)
        f = target / total
    
    aam.q = f*js.charge        # copy avg. charges of titratable sites
    aam.q = aam.q.fillna(0)  # fill missing
    
    print('net charge = ', aam.q.sum())
            
    return aam

Run titration simulations at different pH

The proton titration routine allows only for 1:1 salts and undefined behavior is expected with 2:1 or 3:1 salts. Note that this is a current technical limitation that could be eliminated in the future. After the simulation is done, a final structure with average charges is saved to disk.

Steps:

  1. Loop over pH, single salt activity (NaCl)
  2. Equilibration and production MC runs
  3. Plot charge vs. pH

In [ ]:
%cd $workdir
macromolecule = '../hsa-ph7.aam'
xtc_freq      = 1e200      # frequency to save xtc file (higher=less frames)
pH_range      = np.arange(6, 8, 1).tolist()
activity      = 0.1        # mol/l
d             = salts.NaCl # which salt
d.L           = 120        # box length (Å)
TITRATION     = True       # Do pH titration
GC            = False

tdata         = { 'pH': [], 'Z': [] } # store titration curve here

salt = d.label

def getPrefix( salt, pH, activity ):
    return str(salt) + '-titration-pH' + str(pH) + '-activity' + str(activity)

for pH in pH_range:
    pfx = getPrefix(salt, pH, activity)
    
    if not os.path.isdir(pfx): # run simulation
        %mkdir -p $pfx
        %cd $pfx

        # equilibration run (no translation)
        !rm -fR state
        micro=500
        mkinput()
        !../mc/excess > eq.tit

        # production run
        micro=2000
        mkinput()
        !../mc/excess > out.tit
        
        # create an aam file w. average charges
        aam = AverageChargeAAM( macromolecule, 'gctit-output.json', group='_protein', scale2int=True )
        WriteAAM( 'avgcharge.aam', aam )
        %cd -q ..
        
    if os.path.isdir(pfx): # analysis
        %cd -q $pfx
        aam = AverageChargeAAM( macromolecule, 'gctit-output.json', group='_protein', scale2int=False )
        tdata['pH'].append( pH )
        tdata['Z'].append( aam.q.sum() )
        %cd -q ..

if len( tdata['pH'] ) > 0:
    plt.plot(tdata['pH'], tdata['Z'], 'bo')
    plt.xlabel('pH')
    plt.ylabel('Z')
    plt.title(u'pH titration curve ($a_{NaCl}=$'+str(activity)+')')

print('done.')

TITRATION = False

III. Ion Distributions around Macromolecules

From the previous step, we have protein structure files (.aam) at with fixed charges corresponding to specific pH. These can now be placed in the middle of the box, while salt (1:1, 2:1, 3:1) are treated with GCMC.

Run simulation

In the equilibration run, Grand Canonical salt is enabled, meaning that the right amount of salt particles is being "sucked" into the simulation box according to the input activity / chemical potential.

Next, the final configuration from the equilibration is used as a starting point for a production run where the number of salt particles is fixed. This is because we now save configurations to an XTC trajectory (Gromacs style) that we will use for further analysis. This format does not support fluctuating number of particles.

To analyse the spatial distribution functions, i.e. generate cube files, Gromacs must be installed, in particular the gmx tool. You may need to adjust the numbers specified in the echo command just before gmx: these specify which groups to analyse and should point to Na and Cl, respectively (check gmx output).

Todo:

  • Loop over structures from step II

In [ ]:
import mdtraj as md

activity=0.3   # mol/l
d = salts.NaCl # which salt
d.L=120        # box length (Å)

eq=True
prod=True
analyse=True

%cd $workdir

for pH in pH_range:
    pfx = getPrefix(salt, pH, activity)
    if os.path.isdir(pfx):

        %cd $pfx

        # make gromacs index file for selecting ions
        f = open('index.ndx', 'w')
        traj = md.load('confout.gro')
        ndx_ion1 = traj.top.select('name '+d.ion1) # index w. all cations
        ndx_ion2 = traj.top.select('name '+d.ion2) # index w. all anions
        ndx_sys  = traj.top.select('all')
        f.write( '[' + d.ion1 + ']\n' + ' '.join(str(e+1) for e in ndx_ion1 ) + 2*'\n')
        f.write( '[' + d.ion2 + ']\n' + ' '.join(str(e+1) for e in ndx_ion2 ) + 2*'\n')
        f.write( '[ System ]\n' + ' '.join(str(e+1) for e in ndx_sys ) + 2*'\n')
        f.close()

        if eq:
            print('Equilibration run - GC enabled')
            GC=True
            micro=500
            mkinput()
            !rm -fR state
            !$workdir/mc/excess > eq.gc

        if prod:
            print('Production run - GC disabled')
            GC=False
            micro=10000
            mkinput()
            !$workdir/mc/excess > out.nogc

        if analyse:
            !cp confout.pqr confout.pdb # dirty conversion, but will work for this purpose
            !echo -ne "$d.ion1\nSystem" | gmx --quiet spatial -s confout.gro -f traj.xtc -bin 0.2 -nopbc -n index.ndx   # Sodium
            !mv grid.cube ion1.cube
            !echo -ne "$d.ion2\nSystem" | gmx --quiet spatial -s confout.gro -f traj.xtc -bin 0.2 -nopbc -n index.ndx   # Chloride
            !mv grid.cube ion2.cube
            !rm confout.pdb

        %cd -q ..
print('done.')

Viewing

To view the density maps, open the above generated files with:

vmd confout.pqr ion1.cube ion2.cube confout.pqr

(In my version of VMD, confout.pqr has to be specified twice to respect radii and charges when cube files are loaded)

  • Go to representation
  • Use vdW for the selection: not name Na Cl CNa
  • Use iso-density representation for the two density maps, and adjust threshold. Choose solid surface.

In [ ]: