Molecule Dynamic Simulation of nucleosome under Tension and Force Distribution Analyze

</center>

Chen Chen

Supervised by: Prof. Dr. Jörg Langowski
Co-operator: Dr. Frauke Gräte
Mentor: Dr. Senbo Xiao

In [4]:
#initialize.http://127.0.0.1:8889/notebooks/Force_Analyze.ipynb#
%autosave 120
%pylab inline
%qtconsole


Autosaving every 120 seconds
Populating the interactive namespace from numpy and matplotlib

In [2]:
#Other initilize
from IPython.display import HTML, Image

In [2]:
#change working directory
import os
working_directory = "/var/run/media/chen/Simulation/Simulation/100mM_1000f_vs_100f"
os.chdir(working_directory)

In [2]:
#Other initialize
import pandas as pd
import numpy as np


Caching the list of root modules, please wait!
(This will only be done once - type '%rehashx' to reset cache!)

Outline

  • Introduction
    • Molecular Dynamics (based on Gromacs-5.0dev)
    • Discription of nucleosome
  • Force Calculation
    • During Simulation
    • Force re-calculate from trajectory
  • Data Analysis: a demo example
    • Render in VMD
    • Off-line data plot
  • Summary and Outlook
    • Possible improvement
    • Personal imagination of MD package

Introduction:

  • Molecular Dynamics (based on Gromacs-5.0dev)
  • Discription of nucleosome

Molecule Dynamics Simulation


In [6]:
#Typical Flow Chart of MD Simulation steps in Gromacs.
Image("gromacs_flow_chart.png")


Out[6]:

Description of Nucleosome


In [8]:
#The nucleosome that we familiar with
Image(url = "http://upload.wikimedia.org/wikipedia/commons/d/d9/Nucleosome_1KX5_colour_coded.png")


Out[8]:

The crystal structure of the nucleosome core particle consisting of H2A , H2B , H3 and H4 core histones, and DNA. The view is from the top through the superhelical axis.

Description of nucleosome (in Gromacs) on different scales

  • System:
    • 1 Nucleosome
  • Molecules:
    • 8 histone subunits
    • 2 DNA single strand
  • Residues:
    • 1306 residues
      • 974 for Histone (with tails)
      • 332 for DNA (147 core base pairs + 19 extra)
  • Atoms:
    • 26305 atoms

System:


In [ ]:
#VMD as Conformation

Mols (In VMD: Fragment)


In [8]:
#VMD as Fragment

In [11]:
#Expected~
Image(url="http://www.accessexcellence.org/RC/VL/GG/images/nucleosome.gif")


Out[11]:
  • Initial purpose is to simulate the butterfly effect, but may not came true in next N years.

Residues


In [12]:
# VMD resid

Basic assumption of Molecular Dynamics Simulation

  • Simulations are classical:
    • For a system of N interacting atoms:
      • $m_i {\partial^2 \vec{r}_i \over \partial t^2} = \vec{F}_i, i = 1...N$
    • Forces are negative derivatives of a potential function V:
      • $\vec{F}_i = - {\partial V \over \partial \vec{r}_i}$
  • Electrons are in the ground state
  • Force Fields are approximate
  • Pair-additive: (apart from long-range Coulomb forces)
    • $\vec{F}_{ij} = -\vec{F}_{ji}$
  • Long-range interactions are cut off
  • Boundary conditions are unnatural
    • "Crowded" environment

We can not simulate nucleosome in vaccum!


In [14]:
#Nucleosome Boom!

Force Calculation

  • Force is the main reason that drives the molecular dynamic simulation.
    • It was calculated every simuatlion step.
    • In using of updating coordinate of simulation structure.
    • Using redused uint:
      • $kJ\cdot mol^{-1}\cdot nm^{-1}$
      • $1 kJ\cdot mol^{-1}\cdot nm^{-1}$
        • $= 1000 N \cdot m \div (6.022\cdot 10 ^ {-23} \cdot 10 ^ {-9} m)$
        • $= 1.66 \cdot 10^{-12}N = 1.66pN (picoNewton)$

Force Categories:

  • Bonded Force: (Very Short range, List in topology file)
    • Bonds (2 atoms)
    • Angle (3 atoms)
    • Proper Dihedrals (4 atoms)
    • improper Dihedrales (4 atoms)
    • nb14 (exclusion)
  • Nonbonded Force: (Cut off and Correctness)
    • Lennard-Jones (Short range)
    • Coulomb (Long range)

Bonded Force


In [16]:
#Bonded force
Image("bonded_force.png")


Out[16]:
  • Force palys the fundamental role in all biological processes. All biological motion, from cellular motility, to translation and replication of DNA, is driven by molecular-scale forces. To study the mechanical propreties of single nucleosome, we performed force spectroscopy simulation of our system. By loading certain force on the end of DNA linker, we can study the internal dynamic property. Such as: how does nucleosme keep stable when perturbation happenes? Which domain contributes most to stabilization?
  • Figure 1: Fragment representation:
    • Nucleosome can be viewed as a DNA - Protein binding system: a double helix DNA wrapped 1.75 turns around 8 histone subunts, 10 molecule fragments in total. This image shows intermediate simulation step by loading force on both end of DNA. Different color represent different molecule fragments.
  • Figure 2: Force representation:
    • From simulation trajectories we can calculate interactive force between DNA and histone, and color them according to force value on residue level. This figure shows the force representation of the same state as figure 1.
  • Figure 3: Force value plot(optional):
    • We can do force analyze for certain residue we interested in, for example, plot the force as a time signal.

In [ ]:

  • Force is the main reason that drives the molecular dynamic simulation. It was calculated every simuatlion step, in the use of updating coordinate of simulation structure.

  • For a standard simulation procedure, from a certain pdb structure to simulation result, including the following steps:

-1.1 Descript the structure to make simulation program understandable:

  • Choose a certain Force Field, including the Atom discription(**mass, charge**), bonded force(**bonds, pairs for nb14, angles, and dihedrals X2**) and nonbonded force (**vdw and coulomb**)

  • Add water and salts.

  • for implicit solvent add other artificial effects

-1.2 Do Energy minimization

  • To avoid force between particle and particle that larger than 1000 force unit ($kJ\cdot mol^{-1}\cdot nm^{-1}$)

    • As for:

      • $1kJ = 1000 N\cdot m$

      • $1nm = 10 ^ {-9} m$

      • $1mol = 6.022\cdot 10 ^ {-23}$ uintless

    • So that, for 1 force unit:

      • $1kJ\cdot mol^{-1}\cdot nm^{-1} = 1000 N \cdot m \div (6.022\cdot 10 ^ {-23} \cdot 10 ^ {-9} m) = 1.66 \cdot 10^{-12}N = 1.66pN (picoNewton)$

-1.3 Main MD Loop:

  • while (steps < Set_to_simulation_steps)

    • {

      • dd_partition_system();

      • do_force();

      • write_traj();

      • update();

      • steps++;

    • }

  • inside the loop, the steps do_force() calculate the forces and update the trajectory:

    • do_force()

    • {

      • dd_move_x();

      • do_ns();

      • do_force_lowlevel();

      • dd_move_f();

      • calc_viral();

      • anything else;

    • }

  • do_force_lowlevel(): The core scheme to calculate force, include:

    • do_force_lowlevel()

    • {

      • do_walls(); //calc forces and viral for walls

      • do_nonbonded(); //calc forces and viral from nonbonded interaction

      • calc_bonds(); //calc bonds

    • }

-1.4 Finish MD and Force Distribution Analyzse

  • since the function do_nonbonded() and cals_bonds() were merged forces together, the force analyse for each frame should seperate them.

0. Methods:

  • Basic Consideration:

    • F(total) = F(bonded) + F(nonbonded)

      • For a single molecule, sum(F(bonded)) = 0; sum(F(nonbonded)) = 0

      • then: F(nonbonded) = F(total) - F(bonded)

        1 If only need to know the force that loaded on molecules (globaly), then only bonded force is needed to be calculated out.

        2 Pairwise force??

          ToDo: explain detail or redifine //As: F(total) = F(internal) + F(external)
        
          sum(F(internal)) == 0
        
          F(external) = ??(try: Should over write ns()? //or use an OO search type) ==> modified nblists afterwards
    • Force(total) = F(bonded) + F(internal) + F(external)
  • Include:

    • Bonded (cals_bonds(): run sequencially, not so difficult)

      • As the calculation is atomic based, first calculate the bonded force for each atom, then sequncially binned to resiedue.

        (To check: A-B-C F(A) + F(C) == F(B)? )

    • Nonbonded (do_nonbonded.c and in kernel. Different kernels, merged, seperatable?)

      • elec (short and long, no PME)

      • vdw (should exclude nb internal. How?) //==> use two nblist to store internal and external nblists.

  • Method:

    • If only bonded force included, the world will be happy. Simply used the build in function clac_bonds.

    • To eveluate the nonbonded force, modification of kernel is not possible.

      • The Brute force way is to write a method for all force(vdw and Coul) that include, and loop all of them. (Worst way, nb14 also affect)
      • Improvement will be using the ns() (with different cut off?)

        However, severial problems will occurs:

          1 nb14 
        
              Not included in nb_kernel, dumped into bounded. (mostly zeros)
        
          2 internal v.s external? (for DNA internal should define as Double Helix.)
        
              F = F(internal_bond) + F(internal_nonbonded) + F(external_nonbonded) ~ 0. (Can not be achieved since PME, water and ions are gone, the md external will never be the same with analyze external)

In [2]:

1 Old PF eveluate

  • What can do?

    • Pairwise force calculation, (only) support 1 slection or 2 selections (1 selection can be one molecule or some atoms, or some residues, whatever), and calculate forces acording to particle(atom) and residue map

      • For 1 selection:
        • bonded_force
        • nonbonded_force
        • nb14
    * For 2 selections:
        * Bonded_force (most time meaningless)      //to check for the parameter setting
        * Nonbonded_force 
        * nb14                            //to check for the parameter setting
  • Not convience:

    • Take a system with 4 protein molecules for example (for nucleosome has 1 dsDNA and 8 protein proteins.)
      • For 2 selections scheme, $ C{2\choose4} =6$,
      • which means, should change scripts 6 times, and run 6 times.
      • for nucleosome, should do that for $ C{2\choose9} = 36$ times, for each combination
    • of course, can do 1 selection for the whole structure, and render it into a colorful picture.
      • But it includes(at least) for nonbonded forces, internal and external.
      • Can be Conflicted. F(internal) = - F(external)

2 Reconstruct Old PF

  • Brute force test for 2 selections
  • Looping through index and sum F_coul and F_vdw

    • Use the all to all kernal
    • For small systems works fine, but not for nucleosome.
  • Can use the nbkernal All_vs_all

3 New PF (short for *Partial Force*, not *Pairwise Force*)structure

  • Gromacs Version: 5.0-dev
    • New version is (somehow) C++ style, which is possible to do OO, and easy to reuse code
  • Rely on nblist (cutoff setting),

    • Modified it according to selection, and seperated nblists !!!
  • input:

    • -s *.tpr (included simulation setup, and topology which is crucial)
    • -f .xtc or .trr (simulation result, with coordinates)
    • -n *.ndx (not support yet)
  • output:

    • *bonded.trr: bonded force for each frame/atoms(optional)
    • *internal.trr: internal nonbonded force for each frame/atoms(optional)
    • *external.trr: external nonbonded force for each frame/atoms(optional)
    • *bonded.csv: residue bonded force sequencially, should include index(Rn - Rn+1), n in [0, maxresidu_number_in_mol]
    • *internal.csv: residue based (for each residue number) internal force (for DNA include double-helix, for Protein define as each subunit)
    • *external.csv: residue based (for DNA exclude double-helix) this is going to use for render in VMD movie

3.1 Working Flows:

  • Principle:

    • Keep all the function as it was. Do not modify the Basic API
  • Steps:

    • Read in tpr file
    • Generate atom
    • for each frame in xtc file:
      • calculate force and save

3.2 Corrnetness test:

3.2.1 Sum of force should be zero.

  • For a system with 1708 Atoms, mistakes: (Float point round up problems)
    • Bonded force: ~ 0.1%
    • Internal: ~0.7%
    • External: ~0.03%
    • !! REMEMBER to clean the force after one circle
  • ToDo: output all the summed and max value,then know the ratio of real mistake

3.2.1 Others

3.3 Example:


In [5]:
os.listdir(working_directory)


Out[5]:
['#md.log.1#',
 '#md.log.2#',
 'force_bonded.csv',
 'force_bonded_100mM_1000f_4ns.csv',
 'force_external.csv',
 'force_external_100mM_1000f_4ns.csv',
 'force_internal.csv',
 'force_internal_100mM_1000f_4ns.csv',
 'md.log',
 'nucleosome_forpf.tpr',
 'nucleosome_vs_0mM_30nm_0_topull_100f_20ns.xtc',
 'nucleosome_vs_0_20ns_30nm_1000mM_reset_70f_27ns.xtc',
 'nucleosome_vs_0_30nm_1000mM_1000f_pull4ns.xtc',
 'nucleosome_vs_0_30nm_100mM_1000f_4ns.xtc',
 'nucleosome_vs_0_30nm_100mM_100f_20ns.xtc',
 'nucleosome_vs_100mM_30nm_0_topull_100f_20ns.xtc',
 'nucleosome_vs_30nm_1000mM_50f_50ns.xtc',
 'nucleosome_vs_30nm_1000mM_90f_50ns.xtc']

In [9]:
fm = pd.read_csv('force_external_)
cl=fm.columns

In [1]:
dt = fm[cl[0]]


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-881476a13128> in <module>()
----> 1 dt = fm[cl[0]]

NameError: name 'fm' is not defined

In [29]:
df = fm[cl[491]]
a = plot(dt, df)



In [31]:
max(df)
cl[491]


Out[31]:
'491-LYS'

4 Other to consider

  • nb14?
    • in principle should be included into bonded.
  • Vsite:
    • Take care, the atom index list may not corresipond, somehow should be check carefully.
  • PME?
    • should not include, check old version

5 Controls

5.1 Old PF

  • Old PF
  • Tune the setting of the PME in *.mdp file for old PF for 2 selections

5.2 New PF

  • 1 all force:
    • Reference to simulation result directly. (check if use the same kernal)

6 Technical details

6.1 Modified the nblist

  • The most difficult part!!!!!!!

    • since MAX exclusion No. is 32, can not set exclusion to external mols.
    • Only can do :
      • if (a_i.mol == a_j.mol) continue; // mol define DNA_1 + DNA_2
      • how?
  • The structure of nblist:

    • For atom i = N, N in range(0, max_index)
      • ij in range(ij[N], ij[N+1])
      • jlist[ij], do something for i and jlist[ij];
  • Between ns() and do_nonbonded(), Resize the ij, and jlist:
    • Algorithm:
      • for i in range(0, max_index): //from 1 nblist to 2 nblist
        • for ij in range(ij[i], ij[N+1]):
          • if (jlist[ij].mol == i.mol):
            • put ij in list_internal
          • else:
            • put ij in list_external
    • After that:
      • set nblist_external = list_external;
      • set nblist_internal = list_internal;
  • do_nonbonded(): save f(internal) and f_long(external) seperately
  • calc_bond(): save f(bond)
  • save each frame residue based data as csv format.

  • !!!Take care of the virtual site force

6.1.0 Data Structure of t_nblist

  • Data Structure of t_nblist:

    • int igeometry; // The type of list(atom, water, etc.)
    • int elec; // Coulomb loop type index for kernals
    • int ielecmod; //Coulomb modifier (e.g. switch/shift) what the hell
    • int ivdw; //VdW loop type index for kernels
    • int ivdwmod; //VdW modifier (e.g. switch/shift)
    • int type; //Type of interaction, listed in gmx_nblist_interaction_type
    • int nri, maxnri; //Current/max number of i particles
    • int nrj, maxnrj; //Current/max number of j particles
    • int maxlen; //maxnr of j atoms for a single i atom
    • int* iinr; //The i-elements
    • int* iinr_end; // the end atom only with enlist CG;
    • int* gid; // Index in energy arrays
    • int* shift; // Shift vector index;
    • int* jindex; //index in jjnr
    • int* jjnr; //The j-atom list
    • int* jjnr_end; //The end atom, only with enltypeCG
    • t_excl* excl; //Exclustions, only with enltypeCG

    • void* kernelptr_vf;

    • void* kernelptr_v;
    • void* kernelptr_f;
    • int simd_padding_width;

      • For atom I = nblist->iinr[N] (0 <= N < nblist->nri) there can be
      • several neighborlists (N's), for different energy groups (gid) and
      • different shifts (shift).
      • For corresponding J atoms for each list start at:
      • nblist->jjnr[JI]
      • with nblist->jindex[N] <= JI < nblist->jindex[N+1] *
      • enlist is of the form enlistUNIT1_UNIT2:
      • UNIT ATOM: there is one atom: iinr[N] or jjnr[JI]
      • UNIT SPC: there are 3 atoms: iinr[N],iinr[N]+1,iinr[N]+2, jjnr analog.
      • UNIT TIP4P: there are 4 atoms: iinr[N],...,iinr[N]+3, jjnr analog.
      • UNIT CG: there are N atoms: iinr[N],...,iinr_end[N]-1, jjnr analog. *
      • Clear?

6.1.1 How to use official nb()?

  • According the rule of no breaking basic API, which means, no modification of origin code is allowed, what can we do?
    • Notice inside nbkernal*, there is a check for valid atom.
    • Seperate the jjnr may not really matter. (may waste of 8iinr)
    • Do_nobonded() twice.
  • Also, should deeply understand the data structure of gmx_mtop_t
    • char **name; // Name of the topology
    • gmx_ffparams_t ffparams; // force field parameters.
    • int nmoltype; //How many mols in the topology?
    • gmx_moltype_t *moltype; //each moltype de

7. Render in vmd

7.1 in tcl


In [ ]:

N: Anything else?

N.1 Simulation time step consideration:

N.1.1 Sample Rate:

  • Suppose for a system with the fast frequency $\omega_{max}$
    • To achieve the full percise, the sample frequency $\omega_{sample} = 2\omega_{max}$ (Twice fast)

N.2 Pulling time steps:

  • When do a pulling simulation, the frequency of unit that has been loaded force, is going to increase.

    • A similar example is tune violin, high $\vec{T}$ ~ hige $\omega$
  • Consider a simple Harmonic Oscillator, with out external force or friction:

    • Basic force formular:
      • $\vec{F} = -kx$
    • Apply the second Newton's Law:
      • $\vec{F} = m\vec{a} =-kx $
    • Then:
      • $m\ddot{x} + kx = 0$
    • Analyze by using Laplase transform:
      • $L(m\ddot{x} + kx) = 0$

N.3 Number of backup files limitation

  • If backup file number over 100, the function of back up file will not work

N.4


In [13]:
Image(url="http://ww3.sinaimg.cn/bmiddle/61e8a1fdjw1ebcpe8vqghg203z03zq3h.gif")


Out[13]:

In [ ]: