Molecule Dynamic Simulation of nucleosome under Tension and Force Distribution Analyze

</center>

Chen Chen

Supervised by: Prof. Dr. Jörg Langowski
Co-operator: Dr. Frauke Gräter
Mentor: Dr. Senbo Xiao

In [19]:
#initialize.http://127.0.0.1:8889/notebooks/Force_Analyze.ipynb#
%autosave 120
%pylab inline
%qtconsole


Autosaving every 120 seconds
Populating the interactive namespace from numpy and matplotlib

In [1]:
#Other initilize
from IPython.display import HTML, Image
import os

In [2]:
#change working directory
import os
working_directory = "/var/run/media/chen/Simulation/Simulation/100mM_1000f_vs_100f"
os.chdir(working_directory)

In [2]:
#Other initialize
import pandas as pd
import numpy as np

Initial Perspective

  • Do the simulation of pulling nucleosome

In [3]:
Image(url="https://raw.github.com/maverickmw/pfdreport/master/movies/initial_perspect.gif")


Out[3]:
  • Do force analysis by using code from Dr. Frauke Gräter's group
  • Done!
  • In principle it should work, but practically not.

Outline

  • Introduction
    • Molecular Dynamics (all based on Gromacs-5.0dev)
    • Discription of nucleosome
  • Force Calculation
    • In Simulation Loops
    • Force distribution calculation
  • Data Analysis: a demo example
    • Render in VMD
    • Off-line data plot
  • Summary and Outlook
    • Possible improvement
    • Personal imagination of MD package

Introduction:

  • Molecular Dynamics (based on Gromacs-5.0dev)
  • Discription of nucleosome

Molecule Dynamics Simulation


In [6]:
#Typical Flow Chart of MD Simulation steps in Gromacs.
Image("gromacs_flow_chart.png")


Out[6]:

Description of Nucleosome


In [4]:
#The nucleosome that we familiar with
Image(url = "http://upload.wikimedia.org/wikipedia/commons/d/d9/Nucleosome_1KX5_colour_coded.png", width=300, height=300)


Out[4]:

The crystal structure of the nucleosome core particle consisting of H2A , H2B , H3 and H4 core histones, and DNA. The view is from the top through the superhelical axis.

Description of nucleosome (in Gromacs) on different scales

  • System:
    • 1 Nucleosome
  • Molecules:
    • 8 histone subunits
    • 2 DNA single strand
  • Residues:
    • 1306 residues
      • 974 for Histone (with tails)
      • 332 for DNA (147 core base pairs + 19 extra)
  • Atoms:
    • 26305 atoms

System:


In [10]:
#VMD as Conformation
Image("../pfdreport/conformation.png")


Out[10]:

Mols (In VMD: Fragment)


In [9]:
#VMD as Fragment
Image("../pfdreport/fragment.png")


Out[9]:

In [11]:
#Expected~
Image(url="http://www.accessexcellence.org/RC/VL/GG/images/nucleosome.gif")


Out[11]:
  • Initial purpose is to simulate the butterfly effect, but may not came true in next N years.

Residues


In [8]:
# VMD resid
Image("../pfdreport/resid.png")


Out[8]:

Basic assumption of Molecular Dynamics Simulation

  • Simulations are classical:
    • For a system of N interacting atoms:
      • $m_i {\partial^2 \vec{r}_i \over \partial t^2} = \vec{F}_i, i = 1...N$
    • Forces are negative derivatives of a potential function V:
      • $\vec{F}_i = - {\partial V \over \partial \vec{r}_i}$
  • Electrons are in the ground state
  • Force Fields are approximate
  • Pair-additive: (apart from long-range Coulomb forces)
    • $\vec{F}_{ij} = -\vec{F}_{ji}$
  • Long-range interactions are cut off
  • Boundary conditions are unnatural
    • "Crowded" environment

We can not simulate nucleosome in vaccum!


In [11]:
#Nucleosome Boom!
Image(url="https://raw.github.com/maverickmw/pfdreport/master/movies/nucleosome_openmm_100ps_gpu_vacuum.gif")


Out[11]:

Force Calculation

  • Force is the main reason that drives the molecular dynamic simulation.
    • It was calculated every simuatlion step.
    • In using of updating coordinate of simulation structure.
    • Using redused uint:
      • $kJ\cdot mol^{-1}\cdot nm^{-1}$
      • $1 kJ\cdot mol^{-1}\cdot nm^{-1}$
        • $= 1000 N \cdot m \div (6.022\cdot 10 ^ {-23} \cdot 10 ^ {-9} m)$
        • $= 1.66 \cdot 10^{-12}N = 1.66pN (picoNewton)$

Force Categories:

  • Bonded Force: (Very Short range, List in topology file)
    • Bonds (2 atoms)
    • Angle (3 atoms)
    • Proper Dihedrals (4 atoms)
    • improper Dihedrales (4 atoms)
    • nb14 (exclusion)
  • Nonbonded Force: (Cut off and Correctness)
    • Lennard-Jones (Short range)
    • Coulomb (Long range)

Bonded Force


In [16]:
#Bonded force
Image("bonded_force.png")


Out[16]:

Force Calculation in simulation loops:

  • While (simulation loop):

    • do_force() //All on Atom Level

    • {

      • do_ns(); //cost :$O(N)$

      • do_force_lowlevel();

        • {

          • //calc forces from nonbonded interaction
          • do_nonbonded(); //cost :$O(N^2)$ brute forcely; $O(N)$ after ns()

          • //calc bonds

          • calc_bonds(); //cost :$O(N)$
        • }

      • //anything else;

    • }

  • All forces are calculated on Atom level
    • Atom has no idea of which residue or molecular it belongs to.
    • We can not analysis force by using official force output.(All merge together)

Force Distribution Calculation

  • Nucleosome scale:
    • System
    • Molecules
    • Residues
    • Atoms
  • On atoms level, each atom has no idea which residue/molecules it belongs to.
    • An naive approach is to calculate all the interaction amoung atoms.
      • Cost: $O(N^2)$ computation time / memory /data storage
      • 90% data is 0
      • Offline clean up (painful)

Algorithm:

  • Code Dr. Frauke Gräter's group:
    • Based on Gromacs-4.5.3.
    • Using $O(N^2)$ approach
    • Modified source code, C style, hard to reuse or imporved
    • Test on residue and atom level, but molecule level is not guaranteed.
    • Not convient on molecule level, only support 2 selection groups at most.
      • for nucleosome, should do calculation for $ C{2\choose10} = 45$ conbinations
    • Not yet published.
  • Our own force calculation scheme:
    • Based on Gromacs-5.0dev
    • Rely on neighborhood search, $O(N)$ cost
    • Plug in style, no modification of source code.
    • C++ style, reuse by inheritance.
    • Focuse on Residue and molecule level.
      • Atomic level will support by selection (Not yet support)

Methods:

  • input:
    • -s *.tpr (included simulation setup, force field parameters and topology which is crucial)
    • -f .xtc or .trr (simulation result, with coordinates)
  • output:
    • force_bonded.csv: residue bonded force Vector
    • force_internal_nonbonded.csv: residue based (for each residue number) internal force, Scalar
    • force_external.csv: residue based, this is going to use for render in VMD movie
  • Special design for nucleosome:
    • On Molecule level, dsDNA view as 1 molecule (not 2)

Data Analysis: a demo example

  • Nucleosome pulled by 300 force unit (498pN) for 1300 ps to the box boundary.
  • Relax for 6000ps

Render in VMD


In [16]:
#pull 1300ps 
Image(url="https://raw.github.com/maverickmw/pfdreport/master/movies/pull_1300.gif")


Out[16]:

In [17]:
#relax 6000
Image(url="https://raw.github.com/maverickmw/pfdreport/master/movies/relax_6000.gif")


Out[17]:

Off-line data plot: (external force example)

  • Molecule to residue:

In [20]:
#residue range:
H3_1 = arange(1,136)          #0
H3_2 = arange(488, 623)       #4
H4_1 = arange(136,238)        #1
H4_2 = arange(623,725)        #5
H2A_1 = arange(238,366)       #2
H2A_2 = arange(725,853)       #6
H2B_1 = arange(366,488)       #3
H2B_2 = arange(853,975)       #7
DNA = arange(975,1307)  
DNA_1 = arange(975,1141)    #8
DNA_2 = arange(1141,1307)   #9
#print DNA_2

In [22]:
f_in = pd.read_csv("../pfdreport/nucleosome_30nm_md300_1300ps_novs_skip_10/force_external.csv")
f_bond = pd.read_csv("../pfdreport/nucleosome_30nm_md300_1300ps_novs_skip_10/force_bonded.csv")

In [25]:
cl=f_in.columns
print cl


Index([u'Time(ps)', u'NALA1', u'ARG2', u'THR3', u'LYS4', u'GLN5', u'THR6', u'ALA7', u'ARG8', u'LYS9', u'SER10', u'THR11', u'GLY12', u'GLY13', u'LYS14', u'ALA15', u'PRO16', u'ARG17', u'LYS18', u'GLN19', u'LEU20', u'ALA21', u'THR22', u'LYS23', u'ALA24', u'ALA25', u'ARG26', u'LYS27', u'SER28', u'ALA29', u'PRO30', u'ALA31', u'THR32', u'GLY33', u'GLY34', u'VAL35', u'LYS36', u'LYS37', u'PRO38', u'HIP39', u'ARG40', u'TYR41', u'ARG42', u'PRO43', u'GLY44', u'THR45', u'VAL46', u'ALA47', u'LEU48', u'ARG49', u'GLU50', u'ILE51', u'ARG52', u'ARG53', u'TYR54', u'GLN55', u'LYS56', u'SER57', u'THR58', u'GLU59', u'LEU60', u'LEU61', u'ILE62', u'ARG63', u'LYS64', u'LEU65', u'PRO66', u'PHE67', u'GLN68', u'ARG69', u'LEU70', u'VAL71', u'ARG72', u'GLU73', u'ILE74', u'ALA75', u'GLN76', u'ASP77', u'PHE78', u'LYS79', u'THR80', u'ASP81', u'LEU82', u'ARG83', u'PHE84', u'GLN85', u'SER86', u'SER87', u'ALA88', u'VAL89', u'MET90', u'ALA91', u'LEU92', u'GLN93', u'GLU94', u'ALA95', u'SER96', u'GLU97', u'ALA98', u'TYR99', u'LEU100', u'VAL101', u'ALA102', u'LEU103', u'PHE104', u'GLU105', u'ASP106', u'THR107', u'ASN108', u'LEU109', u'CYS110', u'ALA111', u'ILE112', u'HIP113', u'ALA114', u'LYS115', u'ARG116', u'VAL117', u'THR118', u'ILE119', u'MET120', u'PRO121', u'LYS122', u'ASP123', u'ILE124', u'GLN125', u'LEU126', u'ALA127', u'ARG128', u'ARG129', u'ILE130', u'ARG131', u'GLY132', u'GLU133', u'ARG134', u'CALA135', u'NSER136', u'GLY137', u'ARG138', u'GLY139', u'LYS140', u'GLY141', u'GLY142', u'LYS143', u'GLY144', u'LEU145', u'GLY146', u'LYS147', u'GLY148', u'GLY149', u'ALA150', u'LYS151', u'ARG152', u'HIP153', u'ARG154', u'LYS155', u'VAL156', u'LEU157', u'ARG158', u'ASP159', u'ASN160', u'ILE161', u'GLN162', u'GLY163', u'ILE164', u'THR165', u'LYS166', u'PRO167', u'ALA168', u'ILE169', u'ARG170', u'ARG171', u'LEU172', u'ALA173', u'ARG174', u'ARG175', u'GLY176', u'GLY177', u'VAL178', u'LYS179', u'ARG180', u'ILE181', u'SER182', u'GLY183', u'LEU184', u'ILE185', u'TYR186', u'GLU187', u'GLU188', u'THR189', u'ARG190', u'GLY191', u'VAL192', u'LEU193', u'LYS194', u'VAL195', u'PHE196', u'LEU197', u'GLU198', u'ASN199', u'VAL200', u'ILE201', u'ARG202', u'ASP203', u'ALA204', u'VAL205', u'THR206', u'TYR207', u'THR208', u'GLU209', u'HIP210', u'ALA211', u'LYS212', u'ARG213', u'LYS214', u'THR215', u'VAL216', u'THR217', u'ALA218', u'MET219', u'ASP220', u'VAL221', u'VAL222', u'TYR223', u'ALA224', u'LEU225', u'LYS226', u'ARG227', u'GLN228', u'GLY229', u'ARG230', u'THR231', u'LEU232', u'TYR233', u'GLY234', u'PHE235', u'GLY236', u'CGLY237', u'NSER238', u'GLY239', u'ARG240', u'GLY241', u'LYS242', u'GLN243', u'GLY244', u'GLY245', u'LYS246', u'THR247', u'ARG248', u'ALA249', u'LYS250', u'ALA251', u'LYS252', u'THR253', u'ARG254', u'SER255', u'SER256', u'ARG257', u'ALA258', u'GLY259', u'LEU260', u'GLN261', u'PHE262', u'PRO263', u'VAL264', u'GLY265', u'ARG266', u'VAL267', u'HIP268', u'ARG269', u'LEU270', u'LEU271', u'ARG272', u'LYS273', u'GLY274', u'ASN275', u'TYR276', u'ALA277', u'GLU278', u'ARG279', u'VAL280', u'GLY281', u'ALA282', u'GLY283', u'ALA284', u'PRO285', u'VAL286', u'TYR287', u'LEU288', u'ALA289', u'ALA290', u'VAL291', u'LEU292', u'GLU293', u'TYR294', u'LEU295', u'THR296', u'ALA297', u'GLU298', u'ILE299', u'LEU300', u'GLU301', u'LEU302', u'ALA303', u'GLY304', u'ASN305', u'ALA306', u'ALA307', u'ARG308', u'ASP309', u'ASN310', u'LYS311', u'LYS312', u'THR313', u'ARG314', u'ILE315', u'ILE316', u'PRO317', u'ARG318', u'HIP319', u'LEU320', u'GLN321', u'LEU322', u'ALA323', u'VAL324', u'ARG325', u'ASN326', u'ASP327', u'GLU328', u'GLU329', u'LEU330', u'ASN331', u'LYS332', u'LEU333', u'LEU334', u'GLY335', u'ARG336', u'VAL337', u'THR338', u'ILE339', u'ALA340', u'GLN341', u'GLY342', u'GLY343', u'VAL344', u'LEU345', u'PRO346', u'ASN347', u'ILE348', u'GLN349', u'SER350', u'VAL351', u'LEU352', u'LEU353', u'PRO354', u'LYS355', u'LYS356', u'THR357', u'GLU358', u'SER359', u'SER360', u'LYS361', u'SER362', u'LYS363', u'SER364', u'CLYS365', u'NALA366', u'LYS367', u'SER368', u'ALA369', u'PRO370', u'ALA371', u'PRO372', u'LYS373', u'LYS374', u'GLY375', u'SER376', u'LYS377', u'LYS378', u'ALA379', u'VAL380', u'THR381', u'LYS382', u'THR383', u'GLN384', u'LYS385', u'LYS386', u'ASP387', u'GLY388', u'LYS389', u'LYS390', u'ARG391', u'ARG392', u'LYS393', u'THR394', u'ARG395', u'LYS396', u'GLU397', u'SER398', u'TYR399', u'ALA400', u'ILE401', u'TYR402', u'VAL403', u'TYR404', u'LYS405', u'VAL406', u'LEU407', u'LYS408', u'GLN409', u'VAL410', u'HIP411', u'PRO412', u'ASP413', u'THR414', u'GLY415', u'ILE416', u'SER417', u'SER418', u'LYS419', u'ALA420', u'MET421', u'SER422', u'ILE423', u'MET424', u'ASN425', u'SER426', u'PHE427', u'VAL428', u'ASN429', u'ASP430', u'VAL431', u'PHE432', u'GLU433', u'ARG434', u'ILE435', u'ALA436', u'GLY437', u'GLU438', u'ALA439', u'SER440', u'ARG441', u'LEU442', u'ALA443', u'HIP444', u'TYR445', u'ASN446', u'LYS447', u'ARG448', u'SER449', u'THR450', u'ILE451', u'THR452', u'SER453', u'ARG454', u'GLU455', u'ILE456', u'GLN457', u'THR458', u'ALA459', u'VAL460', u'ARG461', u'LEU462', u'LEU463', u'LEU464', u'PRO465', u'GLY466', u'GLU467', u'LEU468', u'ALA469', u'LYS470', u'HIP471', u'ALA472', u'VAL473', u'SER474', u'GLU475', u'GLY476', u'THR477', u'LYS478', u'ALA479', u'VAL480', u'THR481', u'LYS482', u'TYR483', u'THR484', u'SER485', u'ALA486', u'CLYS487', u'NALA488', u'ARG489', u'THR490', u'LYS491', u'GLN492', u'THR493', u'ALA494', u'ARG495', u'LYS496', u'SER497', u'THR498', u'GLY499', u'GLY500', u'LYS501', u'ALA502', u'PRO503', u'ARG504', u'LYS505', u'GLN506', u'LEU507', u'ALA508', u'THR509', u'LYS510', u'ALA511', u'ALA512', u'ARG513', u'LYS514', u'SER515', u'ALA516', u'PRO517', u'ALA518', u'THR519', u'GLY520', u'GLY521', u'VAL522', u'LYS523', u'LYS524', u'PRO525', u'HIP526', u'ARG527', u'TYR528', u'ARG529', u'PRO530', u'GLY531', u'THR532', u'VAL533', u'ALA534', u'LEU535', u'ARG536', u'GLU537', u'ILE538', u'ARG539', u'ARG540', u'TYR541', u'GLN542', u'LYS543', u'SER544', u'THR545', u'GLU546', u'LEU547', u'LEU548', u'ILE549', u'ARG550', u'LYS551', u'LEU552', u'PRO553', u'PHE554', u'GLN555', u'ARG556', u'LEU557', u'VAL558', u'ARG559', u'GLU560', u'ILE561', u'ALA562', u'GLN563', u'ASP564', u'PHE565', u'LYS566', u'THR567', u'ASP568', u'LEU569', u'ARG570', u'PHE571', u'GLN572', u'SER573', u'SER574', u'ALA575', u'VAL576', u'MET577', u'ALA578', u'LEU579', u'GLN580', u'GLU581', u'ALA582', u'SER583', u'GLU584', u'ALA585', u'TYR586', u'LEU587', u'VAL588', u'ALA589', u'LEU590', u'PHE591', u'GLU592', u'ASP593', u'THR594', u'ASN595', u'LEU596', u'CYS597', u'ALA598', u'ILE599', u'HIP600', u'ALA601', u'LYS602', u'ARG603', u'VAL604', u'THR605', u'ILE606', u'MET607', u'PRO608', u'LYS609', u'ASP610', u'ILE611', u'GLN612', u'LEU613', u'ALA614', u'ARG615', u'ARG616', u'ILE617', u'ARG618', u'GLY619', u'GLU620', u'ARG621', u'CALA622', u'NSER623', u'GLY624', u'ARG625', u'GLY626', u'LYS627', u'GLY628', u'GLY629', u'LYS630', u'GLY631', u'LEU632', u'GLY633', u'LYS634', u'GLY635', u'GLY636', u'ALA637', u'LYS638', u'ARG639', u'HIP640', u'ARG641', u'LYS642', u'VAL643', u'LEU644', u'ARG645', u'ASP646', u'ASN647', u'ILE648', u'GLN649', u'GLY650', u'ILE651', u'THR652', u'LYS653', u'PRO654', u'ALA655', u'ILE656', u'ARG657', u'ARG658', u'LEU659', u'ALA660', u'ARG661', u'ARG662', u'GLY663', u'GLY664', u'VAL665', u'LYS666', u'ARG667', u'ILE668', u'SER669', u'GLY670', u'LEU671', u'ILE672', u'TYR673', u'GLU674', u'GLU675', u'THR676', u'ARG677', u'GLY678', u'VAL679', u'LEU680', u'LYS681', u'VAL682', u'PHE683', u'LEU684', u'GLU685', u'ASN686', u'VAL687', u'ILE688', u'ARG689', u'ASP690', u'ALA691', u'VAL692', u'THR693', u'TYR694', u'THR695', u'GLU696', u'HIP697', u'ALA698', u'LYS699', u'ARG700', u'LYS701', u'THR702', u'VAL703', u'THR704', u'ALA705', u'MET706', u'ASP707', u'VAL708', u'VAL709', u'TYR710', u'ALA711', u'LEU712', u'LYS713', u'ARG714', u'GLN715', u'GLY716', u'ARG717', u'THR718', u'LEU719', u'TYR720', u'GLY721', u'PHE722', u'GLY723', u'CGLY724', u'NSER725', u'GLY726', u'ARG727', u'GLY728', u'LYS729', u'GLN730', u'GLY731', u'GLY732', u'LYS733', u'THR734', u'ARG735', u'ALA736', u'LYS737', u'ALA738', u'LYS739', u'THR740', u'ARG741', u'SER742', u'SER743', u'ARG744', u'ALA745', u'GLY746', u'LEU747', u'GLN748', u'PHE749', u'PRO750', u'VAL751', u'GLY752', u'ARG753', u'VAL754', u'HIP755', u'ARG756', u'LEU757', u'LEU758', u'ARG759', u'LYS760', u'GLY761', u'ASN762', u'TYR763', u'ALA764', u'GLU765', u'ARG766', u'VAL767', u'GLY768', u'ALA769', u'GLY770', u'ALA771', u'PRO772', u'VAL773', u'TYR774', u'LEU775', u'ALA776', u'ALA777', u'VAL778', u'LEU779', u'GLU780', u'TYR781', u'LEU782', u'THR783', u'ALA784', u'GLU785', u'ILE786', u'LEU787', u'GLU788', u'LEU789', u'ALA790', u'GLY791', u'ASN792', u'ALA793', u'ALA794', u'ARG795', u'ASP796', u'ASN797', u'LYS798', u'LYS799', u'THR800', u'ARG801', u'ILE802', u'ILE803', u'PRO804', u'ARG805', u'HIP806', u'LEU807', u'GLN808', u'LEU809', u'ALA810', u'VAL811', u'ARG812', u'ASN813', u'ASP814', u'GLU815', u'GLU816', u'LEU817', u'ASN818', u'LYS819', u'LEU820', u'LEU821', u'GLY822', u'ARG823', u'VAL824', u'THR825', u'ILE826', u'ALA827', u'GLN828', u'GLY829', u'GLY830', u'VAL831', u'LEU832', u'PRO833', u'ASN834', u'ILE835', u'GLN836', u'SER837', u'VAL838', u'LEU839', u'LEU840', u'PRO841', u'LYS842', u'LYS843', u'THR844', u'GLU845', u'SER846', u'SER847', u'LYS848', u'SER849', u'LYS850', u'SER851', u'CLYS852', u'NALA853', u'LYS854', u'SER855', u'ALA856', u'PRO857', u'ALA858', u'PRO859', u'LYS860', u'LYS861', u'GLY862', u'SER863', u'LYS864', u'LYS865', u'ALA866', u'VAL867', u'THR868', u'LYS869', u'THR870', u'GLN871', u'LYS872', u'LYS873', u'ASP874', u'GLY875', u'LYS876', u'LYS877', u'ARG878', u'ARG879', u'LYS880', u'THR881', u'ARG882', u'LYS883', u'GLU884', u'SER885', u'TYR886', u'ALA887', u'ILE888', u'TYR889', u'VAL890', u'TYR891', u'LYS892', u'VAL893', u'LEU894', u'LYS895', u'GLN896', u'VAL897', u'HIP898', u'PRO899', u'ASP900', u'THR901', u'GLY902', u'ILE903', u'SER904', u'SER905', u'LYS906', u'ALA907', u'MET908', u'SER909', u'ILE910', u'MET911', u'ASN912', u'SER913', u'PHE914', u'VAL915', u'ASN916', u'ASP917', u'VAL918', u'PHE919', u'GLU920', u'ARG921', u'ILE922', u'ALA923', u'GLY924', u'GLU925', u'ALA926', u'SER927', u'ARG928', u'LEU929', u'ALA930', u'HIP931', u'TYR932', u'ASN933', u'LYS934', u'ARG935', u'SER936', u'THR937', u'ILE938', u'THR939', u'SER940', u'ARG941', u'GLU942', u'ILE943', u'GLN944', u'THR945', u'ALA946', u'VAL947', u'ARG948', u'LEU949', u'LEU950', u'LEU951', u'PRO952', u'GLY953', u'GLU954', u'LEU955', u'ALA956', u'LYS957', u'HIP958', u'ALA959', u'VAL960', u'SER961', u'GLU962', u'GLY963', u'THR964', u'LYS965', u'ALA966', u'VAL967', u'THR968', u'LYS969', u'TYR970', u'THR971', u'SER972', u'ALA973', u'CLYS974', u'DC5975', u'DT976', u'DT977', u'DA978', u'DC979', u'DA980', u'DT981', u'DG982', u'DC983', u'DA984', u'DC985', u'DA986', u'DG987', u'DG988', u'DA989', u'DT990', u'DG991', u'DT992', u'DA993', u'DA994', u'DC995', u'DC996', u'DT997', u'DG998', u'DC999', u'DA1000', u'DG1001', u'DA1002', u'DT1003', u'DA1004', u'DC1005', u'DT1006', u'DA1007', u'DC1008', u'DC1009', u'DA1010', u'DA1011', u'DA1012', u'DA1013', u'DG1014', u'DT1015', u'DG1016', u'DT1017', u'DA1018', u'DT1019', u'DT1020', u'DT1021', u'DG1022', u'DG1023', u'DA1024', u'DA1025', u'DA1026', u'DC1027', u'DT1028', u'DG1029', u'DC1030', u'DT1031', u'DC1032', u'DC1033', u'DA1034', u'DT1035', u'DC1036', u'DA1037', u'DA1038', u'DA1039', u'DA1040', u'DG1041', u'DG1042', u'DC1043', u'DA1044', u'DT1045', u'DG1046', u'DT1047', u'DT1048', u'DC1049', u'DA1050', u'DG1051', u'DC1052', u'DT1053', u'DG1054', u'DG1055', u'DA1056', u'DT1057', u'DT1058', u'DC1059', u'DC1060', u'DA1061', u'DG1062', u'DC1063', u'DT1064', u'DG1065', u'DA1066', u'DA1067', u'DC1068', u'DA1069', u'DT1070', u'DG1071', u'DC1072', u'DC1073', u'DT1074', u'DT1075', u'DT1076', u'DT1077', u'DG1078', u'DA1079', u'DT1080', u'DG1081', u'DG1082', u'DA1083', u'DG1084', u'DC1085', u'DA1086', u'DG1087', u'DT1088', u'DT1089', u'DT1090', u'DC1091', u'DC1092', u'DA1093', u'DA1094', u'DA1095', u'DT1096', u'DA1097', u'DC1098', u'DA1099', u'DC1100', u'DT1101', u'DT1102', u'DT1103', u'DT1104', u'DG1105', u'DG1106', u'DT1107', u'DA1108', u'DG1109', u'DT1110', u'DA1111', u'DT1112', u'DC1113', u'DT1114', u'DG1115', u'DC1116', u'DA1117', u'DG1118', u'DG1119', u'DT1120', u'DG1121', u'DA1122', u'DT1123', u'DT1124', u'DC1125', u'DT1126', u'DC1127', u'DC1128', u'DA1129', u'DG1130', u'DG1131', u'DG1132', u'DC1133', u'DG1134', u'DG1135', u'DC1136', u'DC1137', u'DA1138', u'DG1139', u'DT31140', u'DA51141', u'DC1142', u'DT1143', u'DG1144', u'DG1145', u'DC1146', u'DC1147', u'DG1148', u'DC1149', u'DC1150', u'DC1151', u'DT1152', u'DG1153', u'DG1154', u'DA1155', u'DG1156', u'DA1157', u'DA1158', u'DT1159', u'DC1160', u'DA1161', u'DC1162', u'DC1163', u'DT1164', u'DG1165', u'DC1166', u'DA1167', u'DG1168', u'DA1169', u'DT1170', u'DA1171', u'DC1172', u'DT1173', u'DA1174', u'DC1175', u'DC1176', u'DA1177', u'DA1178', u'DA1179', u'DA1180', u'DG1181', u'DT1182', u'DG1183', u'DT1184', u'DA1185', u'DT1186', u'DT1187', u'DT1188', u'DG1189', u'DG1190', u'DA1191', u'DA1192', u'DA1193', u'DC1194', u'DT1195', u'DG1196', u'DC1197', u'DT1198', u'DC1199', u'DC1200', u'DA1201', u'DT1202', u'DC1203', u'DA1204', u'DA1205', u'DA1206', u'DA1207', u'DG1208', u'DG1209', u'DC1210', u'DA1211', u'DT1212', u'DG1213', u'DT1214', u'DT1215', u'DC1216', u'DA1217', u'DG1218', u'DC1219', u'DT1220', u'DG1221', u'DG1222', u'DA1223', u'DA1224', u'DT1225', u'DC1226', u'DC1227', u'DA1228', u'DG1229', u'DC1230', u'DT1231', u'DG1232', u'DA1233', u'DA1234', u'DC1235', u'DA1236', u'DT1237', u'DG1238', u'DC1239', u'DC1240', u'DT1241', u'DT1242', u'DT1243', u'DT1244', u'DG1245', u'DA1246', u'DT1247', u'DG1248', u'DG1249', u'DA1250', u'DG1251', u'DC1252', u'DA1253', u'DG1254', u'DT1255', u'DT1256', u'DT1257', u'DC1258', u'DC1259', u'DA1260', u'DA1261', u'DA1262', u'DT1263', u'DA1264', u'DC1265', u'DA1266', u'DC1267', u'DT1268', u'DT1269', u'DT1270', u'DT1271', u'DG1272', u'DG1273', u'DT1274', u'DA1275', u'DG1276', u'DT1277', u'DA1278', u'DT1279', u'DC1280', u'DT1281', u'DG1282', u'DC1283', u'DA1284', u'DG1285', u'DG1286', u'DT1287', u'DT1288', u'DA1289', u'DC1290', u'DA1291', u'DT1292', u'DC1293', u'DC1294', u'DT1295', u'DG1296', u'DT1297', u'DG1298', u'DC1299', u'DA1300', u'DT1301', u'DG1302', u'DT1303', u'DA1304', u'DA1305', u'DG31306'], dtype=object)

Data plot:


In [27]:
dt_63 = f_in[cl[63]]
dt_83 = f_in[cl[83]]
time = f_in[cl[0]]

In [30]:
plot(time,dt_63)
plot(time,dt_83)
legend([cl[63],cl[83]])
show()


Bonded force

  • explain by hand!

Summary and outlook

  • Possible improvement
    • Selection based support on atom level information
    • Other Suggestion, please!
  • Future perspect

In [32]:
Image("time_scale.png")


Out[32]:

Thank you for your Patience!

</h1>

And Suggestiongs!

</h1>

  • Basic Consideration:

    • F(total) = F(bonded) + F(nonbonded)

      • For a single molecule, sum(F(bonded)) = 0; sum(F(nonbonded)) = 0

      • then: F(nonbonded) = F(total) - F(bonded)

        1 If only need to know the force that loaded on molecules (globaly), then only bonded force is needed to be calculated out.

        2 Pairwise force??

          ToDo: explain detail or redifine //As: F(total) = F(internal) + F(external)
        
          sum(F(internal)) == 0
        
          F(external) = ??(try: Should over write ns()? //or use an OO search type) ==> modified nblists afterwards
    • Force(total) = F(bonded) + F(internal) + F(external)
  • Include:

    • Bonded (cals_bonds(): run sequencially, not so difficult)

      • As the calculation is atomic based, first calculate the bonded force for each atom, then sequncially binned to resiedue.

        (To check: A-B-C F(A) + F(C) == F(B)? )

    • Nonbonded (do_nonbonded.c and in kernel. Different kernels, merged, seperatable?)

      • elec (short and long, no PME)

      • vdw (should exclude nb internal. How?) //==> use two nblist to store internal and external nblists.

  • Method:

    • If only bonded force included, the world will be happy. Simply used the build in function clac_bonds.

    • To eveluate the nonbonded force, modification of kernel is not possible.

      • The Brute force way is to write a method for all force(vdw and Coul) that include, and loop all of them. (Worst way, nb14 also affect)
      • Improvement will be using the ns() (with different cut off?)

        However, severial problems will occurs:

          1 nb14 
        
              Not included in nb_kernel, dumped into bounded. (mostly zeros)
        
          2 internal v.s external? (for DNA internal should define as Double Helix.)
        
              F = F(internal_bond) + F(internal_nonbonded) + F(external_nonbonded) ~ 0. (Can not be achieved since PME, water and ions are gone, the md external will never be the same with analyze external)

In [2]:

1 Old PF eveluate

  • What can do?

    • Pairwise force calculation, (only) support 1 slection or 2 selections (1 selection can be one molecule or some atoms, or some residues, whatever), and calculate forces acording to particle(atom) and residue map

      • For 1 selection:
        • bonded_force
        • nonbonded_force
        • nb14
    * For 2 selections:
        * Bonded_force (most time meaningless)      //to check for the parameter setting
        * Nonbonded_force 
        * nb14                            //to check for the parameter setting
  • Not convience:

    • Take a system with 4 protein molecules for example (for nucleosome has 1 dsDNA and 8 protein proteins.)
      • For 2 selections scheme, $ C{2\choose4} =6$,
      • which means, should change scripts 6 times, and run 6 times.
      • for nucleosome, should do that for $ C{2\choose9} = 36$ times, for each combination
    • of course, can do 1 selection for the whole structure, and render it into a colorful picture.
      • But it includes(at least) for nonbonded forces, internal and external.
      • Can be Conflicted. F(internal) = - F(external)

2 Reconstruct Old PF

  • Brute force test for 2 selections
  • Looping through index and sum F_coul and F_vdw

    • Use the all to all kernal
    • For small systems works fine, but not for nucleosome.
  • Can use the nbkernal All_vs_all

3 New PF (short for *Partial Force*, not *Pairwise Force*)structure

  • Gromacs Version: 5.0-dev
    • New version is (somehow) C++ style, which is possible to do OO, and easy to reuse code
  • Rely on nblist (cutoff setting),

    • Modified it according to selection, and seperated nblists !!!
  • input:

    • -s *.tpr (included simulation setup, and topology which is crucial)
    • -f .xtc or .trr (simulation result, with coordinates)
    • -n *.ndx (not support yet)
  • output:

    • *bonded.trr: bonded force for each frame/atoms(optional)
    • *internal.trr: internal nonbonded force for each frame/atoms(optional)
    • *external.trr: external nonbonded force for each frame/atoms(optional)
    • *bonded.csv: residue bonded force sequencially, should include index(Rn - Rn+1), n in [0, maxresidu_number_in_mol]
    • *internal.csv: residue based (for each residue number) internal force (for DNA include double-helix, for Protein define as each subunit)
    • *external.csv: residue based (for DNA exclude double-helix) this is going to use for render in VMD movie

3.1 Working Flows:

  • Principle:

    • Keep all the function as it was. Do not modify the Basic API
  • Steps:

    • Read in tpr file
    • Generate atom
    • for each frame in xtc file:
      • calculate force and save

3.2 Corrnetness test:

3.2.1 Sum of force should be zero.

  • For a system with 1708 Atoms, mistakes: (Float point round up problems)
    • Bonded force: ~ 0.1%
    • Internal: ~0.7%
    • External: ~0.03%
    • !! REMEMBER to clean the force after one circle
  • ToDo: output all the summed and max value,then know the ratio of real mistake

3.2.1 Others

3.3 Example:


In [5]:
os.listdir(working_directory)


Out[5]:
['#md.log.1#',
 '#md.log.2#',
 'force_bonded.csv',
 'force_bonded_100mM_1000f_4ns.csv',
 'force_external.csv',
 'force_external_100mM_1000f_4ns.csv',
 'force_internal.csv',
 'force_internal_100mM_1000f_4ns.csv',
 'md.log',
 'nucleosome_forpf.tpr',
 'nucleosome_vs_0mM_30nm_0_topull_100f_20ns.xtc',
 'nucleosome_vs_0_20ns_30nm_1000mM_reset_70f_27ns.xtc',
 'nucleosome_vs_0_30nm_1000mM_1000f_pull4ns.xtc',
 'nucleosome_vs_0_30nm_100mM_1000f_4ns.xtc',
 'nucleosome_vs_0_30nm_100mM_100f_20ns.xtc',
 'nucleosome_vs_100mM_30nm_0_topull_100f_20ns.xtc',
 'nucleosome_vs_30nm_1000mM_50f_50ns.xtc',
 'nucleosome_vs_30nm_1000mM_90f_50ns.xtc']

In [9]:
fm = pd.read_csv('force_external_)
cl=fm.columns

In [1]:
dt = fm[cl[0]]


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-881476a13128> in <module>()
----> 1 dt = fm[cl[0]]

NameError: name 'fm' is not defined

In [29]:
df = fm[cl[491]]
a = plot(dt, df)



In [31]:
max(df)
cl[491]


Out[31]:
'491-LYS'

4 Other to consider

  • nb14?
    • in principle should be included into bonded.
  • Vsite:
    • Take care, the atom index list may not corresipond, somehow should be check carefully.
  • PME?
    • should not include, check old version

5 Controls

5.1 Old PF

  • Old PF
  • Tune the setting of the PME in *.mdp file for old PF for 2 selections

5.2 New PF

  • 1 all force:
    • Reference to simulation result directly. (check if use the same kernal)

6 Technical details

6.1 Modified the nblist

  • The most difficult part!!!!!!!

    • since MAX exclusion No. is 32, can not set exclusion to external mols.
    • Only can do :
      • if (a_i.mol == a_j.mol) continue; // mol define DNA_1 + DNA_2
      • how?
  • The structure of nblist:

    • For atom i = N, N in range(0, max_index)
      • ij in range(ij[N], ij[N+1])
      • jlist[ij], do something for i and jlist[ij];
  • Between ns() and do_nonbonded(), Resize the ij, and jlist:
    • Algorithm:
      • for i in range(0, max_index): //from 1 nblist to 2 nblist
        • for ij in range(ij[i], ij[N+1]):
          • if (jlist[ij].mol == i.mol):
            • put ij in list_internal
          • else:
            • put ij in list_external
    • After that:
      • set nblist_external = list_external;
      • set nblist_internal = list_internal;
  • do_nonbonded(): save f(internal) and f_long(external) seperately
  • calc_bond(): save f(bond)
  • save each frame residue based data as csv format.

  • !!!Take care of the virtual site force

6.1.0 Data Structure of t_nblist

  • Data Structure of t_nblist:

    • int igeometry; // The type of list(atom, water, etc.)
    • int elec; // Coulomb loop type index for kernals
    • int ielecmod; //Coulomb modifier (e.g. switch/shift) what the hell
    • int ivdw; //VdW loop type index for kernels
    • int ivdwmod; //VdW modifier (e.g. switch/shift)
    • int type; //Type of interaction, listed in gmx_nblist_interaction_type
    • int nri, maxnri; //Current/max number of i particles
    • int nrj, maxnrj; //Current/max number of j particles
    • int maxlen; //maxnr of j atoms for a single i atom
    • int* iinr; //The i-elements
    • int* iinr_end; // the end atom only with enlist CG;
    • int* gid; // Index in energy arrays
    • int* shift; // Shift vector index;
    • int* jindex; //index in jjnr
    • int* jjnr; //The j-atom list
    • int* jjnr_end; //The end atom, only with enltypeCG
    • t_excl* excl; //Exclustions, only with enltypeCG

    • void* kernelptr_vf;

    • void* kernelptr_v;
    • void* kernelptr_f;
    • int simd_padding_width;

      • For atom I = nblist->iinr[N] (0 <= N < nblist->nri) there can be
      • several neighborlists (N's), for different energy groups (gid) and
      • different shifts (shift).
      • For corresponding J atoms for each list start at:
      • nblist->jjnr[JI]
      • with nblist->jindex[N] <= JI < nblist->jindex[N+1] *
      • enlist is of the form enlistUNIT1_UNIT2:
      • UNIT ATOM: there is one atom: iinr[N] or jjnr[JI]
      • UNIT SPC: there are 3 atoms: iinr[N],iinr[N]+1,iinr[N]+2, jjnr analog.
      • UNIT TIP4P: there are 4 atoms: iinr[N],...,iinr[N]+3, jjnr analog.
      • UNIT CG: there are N atoms: iinr[N],...,iinr_end[N]-1, jjnr analog. *
      • Clear?

6.1.1 How to use official nb()?

  • According the rule of no breaking basic API, which means, no modification of origin code is allowed, what can we do?
    • Notice inside nbkernal*, there is a check for valid atom.
    • Seperate the jjnr may not really matter. (may waste of 8iinr)
    • Do_nobonded() twice.
  • Also, should deeply understand the data structure of gmx_mtop_t
    • char **name; // Name of the topology
    • gmx_ffparams_t ffparams; // force field parameters.
    • int nmoltype; //How many mols in the topology?
    • gmx_moltype_t *moltype; //each moltype de

7. Render in vmd

7.1 in tcl


In [ ]:

N: Anything else?

N.1 Simulation time step consideration:

N.1.1 Sample Rate:

  • Suppose for a system with the fast frequency $\omega_{max}$
    • To achieve the full percise, the sample frequency $\omega_{sample} = 2\omega_{max}$ (Twice fast)

N.2 Pulling time steps:

  • When do a pulling simulation, the frequency of unit that has been loaded force, is going to increase.

    • A similar example is tune violin, high $\vec{T}$ ~ hige $\omega$
  • Consider a simple Harmonic Oscillator, with out external force or friction:

    • Basic force formular:
      • $\vec{F} = -kx$
    • Apply the second Newton's Law:
      • $\vec{F} = m\vec{a} =-kx $
    • Then:
      • $m\ddot{x} + kx = 0$
    • Analyze by using Laplase transform:
      • $L(m\ddot{x} + kx) = 0$

N.3 Number of backup files limitation

  • If backup file number over 100, the function of back up file will not work

N.4


In [5]:
#Image(url="http://ww3.sinaimg.cn/bmiddle/61e8a1fdjw1ebcpe8vqghg203z03zq3h.gif")

In [ ]: