Force Analyze


In [1]:
#initialize.http://127.0.0.1:8889/notebooks/Force_Analyze.ipynb#
%autosave 120
%pylab inline
%qtconsole


Autosaving every 120 seconds
Populating the interactive namespace from numpy and matplotlib

In [2]:
#change working directory
import os
working_directory = "/var/run/media/chen/Simulation/Simulation/100mM_1000f_vs_100f"
os.chdir(working_directory)

In [3]:
#Other initialize
import pandas as pd
import numpy as np


/usr/lib/python2.7/site-packages/pytz/__init__.py:35: UserWarning: Module argparse was already imported from /usr/lib64/python2.7/argparse.pyc, but /usr/lib/python2.7/site-packages is being added to sys.path
  from pkg_resources import resource_stream

-1. Introduction

  • Force palys the fundamental role in all biological processes. All biological motion, from cellular motility, to translation and replication of DNA, is driven by molecular-scale forces. To study the mechanical propreties of single nucleosome, we performed force spectroscopy simulation of our system. By loading certain force on the end of DNA linker, we can study the internal dynamic property. Such as: how does nucleosme keep stable when perturbation happenes? Which domain contributes most to stabilization?
  • Figure 1: Fragment representation:
    • Nucleosome can be viewed as a DNA - Protein binding system: a double helix DNA wrapped 1.75 turns around 8 histone subunts, 10 molecule fragments in total. This image shows intermediate simulation step by loading force on both end of DNA. Different color represent different molecule fragments.
  • Figure 2: Force representation:
    • From simulation trajectories we can calculate interactive force between DNA and histone, and color them according to force value on residue level. This figure shows the force representation of the same state as figure 1.
  • Figure 3: Force value plot(optional):
    • We can do force analyze for certain residue we interested in, for example, plot the force as a time signal.
  • Force is the main reason that drives the molecular dynamic simulation. It was calculated every simuatlion step, in the use of updating coordinate of simulation structure.

  • For a standard simulation procedure, from a certain pdb structure to simulation result, including the following steps:

-1.1 Descript the structure to make simulation program understandable:

  • Choose a certain Force Field, including the Atom discription(**mass, charge**), bonded force(**bonds, pairs for nb14, angles, and dihedrals X2**) and nonbonded force (**vdw and coulomb**)

  • Add water and salts.

  • for implicit solvent add other artificial effects

-1.2 Do Energy minimization

  1. To avoid force between particle and particle that larger than 1000 force unit ($kJ\cdot mol^{-1}\cdot nm^{-1}$)

    1. As for:

      • $1kJ = 1000 N\cdot m$

      • $1nm = 10 ^ {-9} m$

      • $1mol = 6.022\cdot 10 ^ {-23}$ uintless

    2. So that, for 1 force unit:

      • $1kJ\cdot mol^{-1}\cdot nm^{-1} = 1000 N \cdot m \div (6.022\cdot 10 ^ {-23} \cdot 10 ^ {-9} m) = 1.66 \cdot 10^{-12}N = 1.66pN (picoNewton)$

-1.3 Main MD Loop:

  1. while (steps < Set_to_simulation_steps)

    • {

      • dd_partition_system();

      • do_force();

      • write_traj();

      • update();

      • steps++;

    • }

  2. inside the loop, the steps do_force() calculate the forces and update the trajectory:

    • do_force()

    • {

      • dd_move_x();

      • do_ns();

      • do_force_lowlevel();

      • dd_move_f();

      • calc_viral();

      • anything else;

    • }

  3. do_force_lowlevel(): The core scheme to calculate force, include:

    • do_force_lowlevel()

    • {

      • do_walls(); //calc forces and viral for walls

      • do_nonbonded(); //calc forces and viral from nonbonded interaction

      • calc_bonds(); //calc bonds

    • }

-1.4 Finish MD and Do Analyzse

  • since the function do_nonbonded() and cals_bonds() were merged forces together, the force analyse for each frame should seperate them.

0. Methods:

  • Basic Consideration:

    • F(total) = F(bonded) + F(nonbonded)

      • For a single molecule, sum(F(bonded)) = 0; sum(F(nonbonded)) = 0

      • then: F(nonbonded) = F(total) - F(bonded)

        1 If only need to know the force that loaded on molecules (globaly), then only bonded force is needed to be calculated out.

        2 Pairwise force??

          ToDo: explain detail or redifine //As: F(total) = F(internal) + F(external)
        
          sum(F(internal)) == 0
        
          F(external) = ??(try: Should over write ns()? //or use an OO search type) ==> modified nblists afterwards
    • Force(total) = F(bonded) + F(internal) + F(external)
  • Include:

    • Bonded (cals_bonds(): run sequencially, not so difficult)

      • As the calculation is atomic based, first calculate the bonded force for each atom, then sequncially binned to resiedue.

        (To check: A-B-C F(A) + F(C) == F(B)? )

    • Nonbonded (do_nonbonded.c and in kernel. Different kernels, merged, seperatable?)

      • elec (short and long, no PME)

      • vdw (should exclude nb internal. How?) //==> use two nblist to store internal and external nblists.

  • Method:

    • If only bonded force included, the world will be happy. Simply used the build in function clac_bonds.

    • To eveluate the nonbonded force, modification of kernel is not possible.

      • The Brute force way is to write a method for all force(vdw and Coul) that include, and loop all of them. (Worst way, nb14 also affect)
      • Improvement will be using the ns() (with different cut off?)

        However, severial problems will occurs:

          1 nb14 
        
              Not included in nb_kernel, dumped into bounded. (mostly zeros)
        
          2 internal v.s external? (for DNA internal should define as Double Helix.)
        
              F = F(internal_bond) + F(internal_nonbonded) + F(external_nonbonded) ~ 0. (Can not be achieved since PME, water and ions are gone, the md external will never be the same with analyze external)

In [2]:

1 Old PF eveluate

  • What can do?

    • Pairwise force calculation, (only) support 1 slection or 2 selections (1 selection can be one molecule or some atoms, or some residues, whatever), and calculate forces acording to particle(atom) and residue map

      • For 1 selection:
        • bonded_force
        • nonbonded_force
        • nb14
    * For 2 selections:
        * Bonded_force (most time meaningless)      //to check for the parameter setting
        * Nonbonded_force 
        * nb14                            //to check for the parameter setting
  • Not convience:

    • Take a system with 4 protein molecules for example (for nucleosome has 1 dsDNA and 8 protein proteins.)
      • For 2 selections scheme, $ C{2\choose4} =6$,
      • which means, should change scripts 6 times, and run 6 times.
      • for nucleosome, should do that for $ C{2\choose9} = 36$ times, for each combination
    • of course, can do 1 selection for the whole structure, and render it into a colorful picture.
      • But it includes(at least) for nonbonded forces, internal and external.
      • Can be Conflicted. F(internal) = - F(external)

2 Reconstruct Old PF

  • Brute force test for 2 selections
  • Looping through index and sum F_coul and F_vdw

    • Use the all to all kernal
    • For small systems works fine, but not for nucleosome.
  • Can use the nbkernal All_vs_all

3 New PF (short for *Partial Force*, not *Pairwise Force*)structure

  • Gromacs Version: 5.0-dev
    • New version is (somehow) C++ style, which is possible to do OO, and easy to reuse code
  • Rely on nblist (cutoff setting),

    • Modified it according to selection, and seperated nblists !!!
  • input:

    • -s *.tpr (included simulation setup, and topology which is crucial)
    • -f .xtc or .trr (simulation result, with coordinates)
    • -n *.ndx (not support yet)
  • output:

    • *bonded.trr: bonded force for each frame/atoms(optional)
    • *internal.trr: internal nonbonded force for each frame/atoms(optional)
    • *external.trr: external nonbonded force for each frame/atoms(optional)
    • *bonded.csv: residue bonded force sequencially, should include index(Rn - Rn+1), n in [0, maxresidu_number_in_mol]
    • *internal.csv: residue based (for each residue number) internal force (for DNA include double-helix, for Protein define as each subunit)
    • *external.csv: residue based (for DNA exclude double-helix) this is going to use for render in VMD movie

3.1 Working Flows:

  • Principle:

    • Keep all the function as it was. Do not modify the Basic API
  • Steps:

    • Read in tpr file
    • Generate atom
    • for each frame in xtc file:
      • calculate force and save

3.2 Corrnetness test:

3.2.1 Sum of force should be zero.

  • For a system with 1708 Atoms, mistakes: (Float point round up problems)
    • Bonded force: ~ 0.1%
    • Internal: ~0.7%
    • External: ~0.03%
    • !! REMEMBER to clean the force after one circle
  • ToDo: output all the summed and max value,then know the ratio of real mistake

3.2.1 Others

3.3 Example:


In [5]:
os.listdir(working_directory)


Out[5]:
['#md.log.1#',
 '#md.log.2#',
 'force_bonded.csv',
 'force_bonded_100mM_1000f_4ns.csv',
 'force_external.csv',
 'force_external_100mM_1000f_4ns.csv',
 'force_internal.csv',
 'force_internal_100mM_1000f_4ns.csv',
 'md.log',
 'nucleosome_forpf.tpr',
 'nucleosome_vs_0mM_30nm_0_topull_100f_20ns.xtc',
 'nucleosome_vs_0_20ns_30nm_1000mM_reset_70f_27ns.xtc',
 'nucleosome_vs_0_30nm_1000mM_1000f_pull4ns.xtc',
 'nucleosome_vs_0_30nm_100mM_1000f_4ns.xtc',
 'nucleosome_vs_0_30nm_100mM_100f_20ns.xtc',
 'nucleosome_vs_100mM_30nm_0_topull_100f_20ns.xtc',
 'nucleosome_vs_30nm_1000mM_50f_50ns.xtc',
 'nucleosome_vs_30nm_1000mM_90f_50ns.xtc']

In [9]:
fm = pd.read_csv('force_external_)
cl=fm.columns

In [1]:
dt = fm[cl[0]]


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-881476a13128> in <module>()
----> 1 dt = fm[cl[0]]

NameError: name 'fm' is not defined

In [29]:
df = fm[cl[491]]
a = plot(dt, df)



In [31]:
max(df)
cl[491]


Out[31]:
'491-LYS'

4 Other to consider

  • nb14?
    • in principle should be included into bonded.
  • Vsite:
    • Take care, the atom index list may not corresipond, somehow should be check carefully.
  • PME?
    • should not include, check old version

5 Controls

5.1 Old PF

  • Old PF
  • Tune the setting of the PME in *.mdp file for old PF for 2 selections

5.2 New PF

  • 1 all force:
    • Reference to simulation result directly. (check if use the same kernal)

6 Technical details

6.1 Modified the nblist

  • The most difficult part!!!!!!!

    • since MAX exclusion No. is 32, can not set exclusion to external mols.
    • Only can do :
      • if (a_i.mol == a_j.mol) continue; // mol define DNA_1 + DNA_2
      • how?
  • The structure of nblist:

    • For atom i = N, N in range(0, max_index)
      • ij in range(ij[N], ij[N+1])
      • jlist[ij], do something for i and jlist[ij];
  • Between ns() and do_nonbonded(), Resize the ij, and jlist:
    • Algorithm:
      • for i in range(0, max_index): //from 1 nblist to 2 nblist
        • for ij in range(ij[i], ij[N+1]):
          • if (jlist[ij].mol == i.mol):
            • put ij in list_internal
          • else:
            • put ij in list_external
    • After that:
      • set nblist_external = list_external;
      • set nblist_internal = list_internal;
  • do_nonbonded(): save f(internal) and f_long(external) seperately
  • calc_bond(): save f(bond)
  • save each frame residue based data as csv format.

  • !!!Take care of the virtual site force

6.1.0 Data Structure of t_nblist

  • Data Structure of t_nblist:

    • int igeometry; // The type of list(atom, water, etc.)
    • int elec; // Coulomb loop type index for kernals
    • int ielecmod; //Coulomb modifier (e.g. switch/shift) what the hell
    • int ivdw; //VdW loop type index for kernels
    • int ivdwmod; //VdW modifier (e.g. switch/shift)
    • int type; //Type of interaction, listed in gmx_nblist_interaction_type
    • int nri, maxnri; //Current/max number of i particles
    • int nrj, maxnrj; //Current/max number of j particles
    • int maxlen; //maxnr of j atoms for a single i atom
    • int* iinr; //The i-elements
    • int* iinr_end; // the end atom only with enlist CG;
    • int* gid; // Index in energy arrays
    • int* shift; // Shift vector index;
    • int* jindex; //index in jjnr
    • int* jjnr; //The j-atom list
    • int* jjnr_end; //The end atom, only with enltypeCG
    • t_excl* excl; //Exclustions, only with enltypeCG

    • void* kernelptr_vf;

    • void* kernelptr_v;
    • void* kernelptr_f;
    • int simd_padding_width;

      • For atom I = nblist->iinr[N] (0 <= N < nblist->nri) there can be
      • several neighborlists (N's), for different energy groups (gid) and
      • different shifts (shift).
      • For corresponding J atoms for each list start at:
      • nblist->jjnr[JI]
      • with nblist->jindex[N] <= JI < nblist->jindex[N+1] *
      • enlist is of the form enlistUNIT1_UNIT2:
      • UNIT ATOM: there is one atom: iinr[N] or jjnr[JI]
      • UNIT SPC: there are 3 atoms: iinr[N],iinr[N]+1,iinr[N]+2, jjnr analog.
      • UNIT TIP4P: there are 4 atoms: iinr[N],...,iinr[N]+3, jjnr analog.
      • UNIT CG: there are N atoms: iinr[N],...,iinr_end[N]-1, jjnr analog. *
      • Clear?

6.1.1 How to use official nb()?

  • According the rule of no breaking basic API, which means, no modification of origin code is allowed, what can we do?
    • Notice inside nbkernal*, there is a check for valid atom.
    • Seperate the jjnr may not really matter. (may waste of 8iinr)
    • Do_nobonded() twice.
  • Also, should deeply understand the data structure of gmx_mtop_t
    • char **name; // Name of the topology
    • gmx_ffparams_t ffparams; // force field parameters.
    • int nmoltype; //How many mols in the topology?
    • gmx_moltype_t *moltype; //each moltype de

7. Render in vmd

7.1 in tcl


In [ ]:

N: Anything else?

N.1 Simulation time step consideration:

N.1.1 Sample Rate:

  • Suppose for a system with the fast frequency $\omega_{max}$
    • To achieve the full percise, the sample frequency $\omega_{sample} = 2\omega_{max}$ (Twice fast)

N.2 Pulling time steps:

  • When do a pulling simulation, the frequency of unit that has been loaded force, is going to increase.

    • A similar example is tune violin, high $\vec{T}$ ~ hige $\omega$
  • Consider a simple Harmonic Oscillator, with out external force or friction:

    • Basic force formular:
      • $\vec{F} = -kx$
    • Apply the second Newton's Law:
      • $\vec{F} = m\vec{a} =-kx $
    • Then:
      • $m\ddot{x} + kx = 0$
    • Analyze by using Laplase transform:
      • $L(m\ddot{x} + kx) = 0$

N.3 Number of backup files limitation

  • If backup file number over 100, the function of back up file will not work

N.4