Analysis of local base-steps parameters

  • This tutorial discuss the analyses that can be performed using the dnaMD Python module included in the _do_x3dna_ package. The tutorial is prepared using Jupyter Notebook and this notebook tutorial file could be downloaded from this link.
  • Download the input files that are used in the tutorial from this link.

  • Two following input files are required in this tutorial

    • L-BPS_cdna.dat (do_x3dna output from the trajectory, which contains the DNA bound with the protein)
    • L-BPS_odna.dat (do_x3dna output from the trajectory, which only contains the free DNA)

      These two file should be present inside tutorial_data of the current/present working directory.

  • The Python APIs should be only used when do_x3dna is executed with -ref option.
  • Detailed documentation is provided here.

Importing Python Modules

  • numpy: Required for the calculations involving large arrays
  • dnaMD: Python module to analyze DNA/RNA structures from the do_x3dna output files.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import dnaMD
%matplotlib inline

Initializing DNA object and storing data to it

  • DNA object is initialized by using the total number of base-pairs
  • One base-step is formed by two adjacent base-pairs. Therefore, total number of base-steps is less than one of total number of base-pairs.
  • Six base-pair parameters (shift, slide, rise, tilt, roll and twist) can be read and stored in DNA object from the input file using function set_base_step_parameters(...).

  • To speed up processing and analysis, data can be stored in a HDF5 file by including HDF5 file name as a argument during initialization. Same file can be used to store and retrieve all other parameters.


In [2]:
## Initialization
pdna = dnaMD.DNA(60)     #Initialization for 60 base-pairs DNA bound with the protein
fdna = dnaMD.DNA(60)     #Initialization for 60 base-pairs free DNA

## If HDF5 file is used to store/save data use these:
# pdna = dnaMD.DNA(60, filename='cdna.h5')     #Initialization for 60 base-pairs DNA bound with the protein
# fdna = dnaMD.DNA(60, filename='odna.h5')     #Initialization for 60 base-pairs free DNA

## Loading data from input files in respective DNA object
# Number of base-steps = Number of base-pairs - one
# Number of base-steps in a 60 base-pairs DNA = 59
# "bp=[1, 59]" will load local base-pair parameters of 1 to 59 base-steps
# " parameters = 'All' " will load all six parameters (shift, slide, rise, tilt, roll and twist)
pdna.set_base_step_parameters('tutorial_data/L-BPS_cdna.dat', bp_step=[1, 59], parameters='all', step_range=True)
fdna.set_base_step_parameters('tutorial_data/L-BPS_odna.dat', bp_step=[1, 59], parameters='all', step_range=True)


Reading file : tutorial_data/L-BPS_cdna.dat
Reading frame 1000
Finished reading.... Total number of frame read =  1001

Reading file : tutorial_data/L-BPS_odna.dat
Reading frame 1000
Finished reading.... Total number of frame read =  1001

Local base-step parameter of a base-pair directly from dictionary

  • The DNA.data is a python dictionary which contains all the data as a Python Dictionary. For a base-step, parameter as a function of time can be directly extracted.

In [3]:
# Extracting "Twist" of 22nd bp
twist_20bp = pdna.data['bps']['22']['twist']

#Twist vs Time for 22nd bp
plt.title('22nd bp')
plt.plot(pdna.time, twist_20bp)
plt.xlabel('Time (ps)')
plt.ylabel('Twist ( $^o$)')
plt.show()


Local base-step parameters as a function of time (manually)

  • A specific local base-step parameters for the given base-pairs range can be extracted from the DNA obejct using function dnaMD.DNA.get_parameters(...).
  • The extracted parameters of the given base-step can be plotted as a function of time
  • The extracted parameters (average) for the DNA segment can be plotted as a function of time

Following example shows Twist vs Time plots. These example also shows that how to extract the parameters value from the DNA object. Other properties could be extracted and plotted using similar steps.


In [4]:
# Extracting "Twist" of 20 to 30 base-steps
twist, bp_idx = pdna.get_parameters('twist',[20,30], bp_range=True)

# Twist vs Time for 22nd base-step
plt.title('22nd bp')
plt.plot(pdna.time, twist[2])      # index is 2 for 22nd base-step: (20 + 2)
plt.xlabel('Time (ps)')
plt.ylabel('Twist ( $^o$)')
plt.show()

# Average Twist vs Time for segment 20-30 base-step
avg_twist = np.mean(twist, axis=0)     # Calculation of mean using mean function of numpy
plt.title('20-30 bp segment')
plt.plot(pdna.time, avg_twist)
plt.xlabel('Time (ps)')
plt.ylabel('Twist ( $^o$)')
plt.show()

# Average Twist vs Time for segment 24-28 base-step
# index of 24th base-step is 4 (20 + 4). index of 28th base-step is 8 (20 + 8)
avg_twist = np.mean(twist[4:8], axis=0)     
plt.title('24-28 bp segment')
plt.plot(pdna.time, avg_twist)
plt.xlabel('Time (ps)')
plt.ylabel('Twist ( $^o$)')
plt.show()


Local base-step parameters as a function of time (using provided functions)

Above examples show the method to extract the values from the DNA object. However, dnaMD.DNA.time_vs_parameter(...) function could be use to get parameter values as a function of time for the given base-pairs/step or segment


In [5]:
# Slide vs Time for 22nd bp
plt.title('Slide for 22nd bp')
time, value = pdna.time_vs_parameter('slide', [22])
plt.plot(time, value)
plt.xlabel('Time (ps)')
plt.ylabel('Slide ($\AA$)')
plt.show()

# Rise vs Time for 25-40 bp segment
plt.title('Rise for 25-40 bp segment')
# Bound DNA
# Rise is the distance between two base-pairs, so for a given segment it is sum over the base-steps
time, value = pdna.time_vs_parameter('rise', [25, 40], merge=True, merge_method='sum')
plt.plot(time, value, label='bound DNA', c='k')     # balck color => bound DNA
# Free DNA
time, value = fdna.time_vs_parameter('rise', [25, 40], merge=True, merge_method='sum')
plt.plot(time, value, label='free DNA', c='r')    # red color => free DNA

plt.xlabel('Time (ps)')
plt.ylabel('Rise ( $\AA$)')
plt.legend()
plt.show()


Distribution of local base-steps parameters during MD simulations

  • As shown in above plot of Time vs Rise, comparison between bound and free DNA is very difficult. Therefore, to compare the parameters of either different DNAs or same DNAs in different environment or different segment of same DNAs, the distribution of parameters over the MD trajectory are sometime useful.

In [6]:
#### Rise distribution for 20-45 bp segment
plt.title('Rise distribution for 20-45 bp segment')

### Bound DNA ###

## calculation of parameter distribution for the segment
values, density = pdna.parameter_distribution('rise', [20, 45], bins=20, merge=True, merge_method='sum')

## plot distribution
plt.plot(values, density, label='bound DNA', c='k')     # balck color => bound DNA

### Free DNA ###

## calculation of parameter distribution for the segment
values, density = fdna.parameter_distribution('rise', [20, 45], bins=20, merge=True, merge_method='sum')

## plot distribution
plt.plot(values, density, label='free DNA', c='r')    # red color => free DNA

plt.xlabel('Rise ( $\AA$)')
plt.ylabel('Density')
plt.legend()
plt.show()


#### Twist distribution for 25-40 bp segment
plt.title('Twist distribution for 25-40 bp segment')

### Bound DNA ###

## calculation of parameter distribution for the segment
# Twist is the twist angle between two base-pairs, so for overall twist of a given segment 
# it is considered here as sum over the base-steps
values, density = pdna.parameter_distribution('twist', [25, 40], bins=20, merge=True, merge_method='sum')

## plot distribution
plt.plot(values, density, label='bound DNA', c='k')     # balck color => bound DNA

### Free DNA ###

## calculation of parameter distribution for the segment
values, density = fdna.parameter_distribution('twist', [25, 40], bins=20, merge=True, merge_method='sum')

## plot distribution
plt.plot(values, density, label='free DNA', c='r')    # red color => free DNA

plt.xlabel('Twist ( $^o$)')
plt.ylabel('Density')
plt.legend()
plt.show()


Local base-step parameters as a function of base-steps

  • What is the average values of a given parameter for either each base-step or a DNA segment?
  • To address this question, average values of a given parameter with its error could be calculated for either each base-step or a DNA segment using a function dnaMD.DNA.get_mean_error(...).
  • This average values could be also use to compare two DNA.
  • Standard error could be calculated using block averaging method as derived in this publication. To use this method, g_analyze of GROMACS package should be present in $PATH environment variable.

In [7]:
######## Average Rise values as a function of base-steps ########

plt.title('Average Rise for each base-pairs')

### Calculating Average Rise values for 5 to 56 base-steps DNA bound with protein
bp, rise, error = pdna.get_mean_error([5, 56], 'rise', err_type='block', bp_range=True)

# plot these values
plt.errorbar(bp, rise, yerr=error, ecolor='k', elinewidth=1, color='k', lw=0, marker='o', mfc='k', mew=1, ms=4, label='bound DNA' )

### Calculating Average Rise values for 5 to 56 base-steps DNA
bp, rise, error = fdna.get_mean_error([5, 56], 'rise', err_type='block', bp_range=True)

# plot these values
plt.errorbar(bp, rise, yerr=error, ecolor='r', elinewidth=1, color='r', lw=0, marker='x', mfc='r', mew=1, ms=4, label='free DNA' )

plt.ylabel('Rise ($\AA$)')
plt.xlabel('base-step number')
plt.xlim(0,61)
plt.legend()
plt.show()

######## Average Rise values as a function of DNA segments ########

plt.title('Average Rise for DNA segments')

### Calculating Average Rise values for 5 to 56 base-steps DNA bound with protein
### DNA segments are assumed to made up of 4 base-steps (merge_bp=4)
bp, rise, error = pdna.get_mean_error([5,56], 'rise', err_type='block', bp_range=True, merge_bp=4, merge_method='sum')

# plot these values
plt.errorbar(bp, rise,yerr=error, ecolor='k', elinewidth=1, color='k', lw=1, marker='o', mfc='k', mew=1, ms=4, label='bound DNA' )

### Calculating Average Rise values for 5 to 56 base-steps DNA
### DNA segments are assumed to made up of 5 base-steps (merge_bp=4)
bp, rise, error = fdna.get_mean_error([5,56], 'rise', err_type='block', bp_range=True, merge_bp=4, merge_method='sum')

# plot these values
plt.errorbar(bp, rise, yerr=error, ecolor='r', elinewidth=1, color='r', lw=1, marker='x', mfc='r', mew=1, ms=4, label='free DNA' )

plt.ylabel('Rise ( $\AA$)')
plt.xlabel('base-step number')
plt.xlim(0,61)
plt.ylim(13.0, 14.0)
plt.legend()
plt.show()


Deviation in parameters of bound DNA with respect to free DNA

As discussed in the above section, average parameters with standard error can be calculated for both bound and free DNA. Additionally, deviation in bound DNA with respect to the free DNA could be calculated using function dnaMD.localDeformationVsBPS(...) as shown in the following example.


In [8]:
#### Deviation in shift, slide, rise, tilt, roll and twist
#### Deviation = Bound DNA(parameter) - Free DNA(parameter) 

### Deviation in Shift
fdna_bp, pdna_bp, deviation, error = dnaMD.localDeformationVsBPS(fdna, [5,56], pdna, [5,56], 
                                                                'shift', err_type='block', bp_range=True, 
                                                                merge_bp=4, merge_method='sum')

# plot these values
plt.errorbar(pdna_bp, deviation, yerr=error, ecolor='k', elinewidth=1, color='k', lw=1, marker='o', mfc='k', mew=1, ms=4)

# plot line at zero
plt.plot([0,61], [0.0, 0.0], '--k')

plt.ylabel('Deviation in Shift ($\AA$)')
plt.xlabel('base-step number')
plt.xlim(0,61)
plt.show()

### Deviation in Slide
fdna_bp, pdna_bp, deviation, error = dnaMD.localDeformationVsBPS(fdna, [5,56], pdna, [5,56], 
                                                              'slide', err_type='block', bp_range=True, 
                                                              merge_bp=4, merge_method='sum')

# plot these values
plt.errorbar(pdna_bp, deviation, yerr=error, ecolor='k', elinewidth=1, color='k', lw=1, marker='o', mfc='k', mew=1, ms=4)

# plot line at zero
plt.plot([0,61], [0.0, 0.0], '--k')

plt.ylabel('Deviation in Slide ($\AA$)')
plt.xlabel('base-step number')
plt.xlim(0,61)
plt.show()

### Deviation in Rise
fdna_bp, pdna_bp, deviation, error = dnaMD.localDeformationVsBPS(fdna, [5,56], pdna, [5,56], 
                                                              'rise', err_type='block', bp_range=True, 
                                                              merge_bp=4, merge_method='sum')

# plot these values
plt.errorbar(pdna_bp, deviation, yerr=error, ecolor='k', elinewidth=1, color='k', lw=1, marker='o', mfc='k', mew=1, ms=4)

# plot line at zero
plt.plot([0,61], [0.0, 0.0], '--k')

plt.ylabel('Deviation in Rise ($\AA$)')
plt.xlabel('base-step number')
plt.xlim(0,61)
plt.show()

### Deviation in Tilt
fdna_bp, pdna_bp, deviation, error = dnaMD.localDeformationVsBPS(fdna, [5,56], pdna, [5,56], 
                                                              'tilt', err_type='block', bp_range=True, 
                                                              merge_bp=4, merge_method='sum')

# plot these values
plt.errorbar(pdna_bp, deviation, yerr=error, ecolor='k', elinewidth=1, color='k', lw=1, marker='o', mfc='k', mew=1, ms=4)

# plot line at zero
plt.plot([0,61], [0.0, 0.0], '--k')

plt.ylabel('Deviation in Tilt ( $^o$)')
plt.xlabel('base-step number')
plt.xlim(0,61)
plt.show()

### Deviation in Roll
fdna_bp, pdna_bp, deviation, error = dnaMD.localDeformationVsBPS(fdna, [5,56], pdna, [5,56], 
                                                              'roll', err_type='block', bp_range=True, 
                                                              merge_bp=4, merge_method='sum')

# plot these values
plt.errorbar(pdna_bp, deviation, yerr=error, ecolor='k', elinewidth=1, color='k', lw=1, marker='o', mfc='k', mew=1, ms=4)

# plot line at zero
plt.plot([0,61], [0.0, 0.0], '--k')

plt.ylabel('Deviation in Roll ( $^o$)')
plt.xlabel('base-pair number')
plt.xlim(0,61)
plt.show()

### Deviation in Twist
fdna_bp, pdna_bp, deviation, error = dnaMD.localDeformationVsBPS(fdna, [5,56], pdna, [5,56], 
                                                              'twist', err_type='block', bp_range=True, 
                                                              merge_bp=4, merge_method='sum')

# plot these values
plt.errorbar(pdna_bp, deviation, yerr=error, ecolor='k', elinewidth=1, color='k', lw=1, marker='o', mfc='k', mew=1, ms=4)

# plot line at zero
plt.plot([0,61], [0.0, 0.0], '--k')

plt.ylabel('Deviation in Twist ( $^o$)')
plt.xlabel('base-step number')
plt.xlim(0,61)
plt.show()



In [ ]: