In [1]:
%autosave 0
from __future__ import print_function


Autosave disabled

Annotate structures and simulations

We here show how to find base-pairs and stacking interactions in structures and simulations. The function

stackings, pairings, res = bb.annotate(pdb)

returns three lists:

  • a list of stacking interactions
  • a list of pairing interactions
  • the list of residue names following the usual convention RESNAME_RESNUMBER_CHAININDEX

stackings and pairings contain the list of interactions for the N frames in the PDB/trajectory file and it is organized in the following way: for a given frame $i=1..N$ there are $k=1..Q$ interactions between residues with index pairings[i][0][k][0] and pairings[i][0][k][1]. The type of interaction is specified at the element pairings[i][1][k].

But let's make an example by annotating a PDB file containing a sarcin-ricin motif:


In [2]:
import barnaba as bb

# annotate
pdb = "../test/data/SARCIN.pdb"
stackings, pairings, res = bb.annotate(pdb)

# list base pairings
print("BASE-PAIRS")
for p in range(len(pairings[0][0])):
    res1 = res[pairings[0][0][p][0]]
    res2 = res[pairings[0][0][p][1]]
    interaction =  pairings[0][1][p]
    print("%10s %10s %4s" % (res1,res2,interaction))

# list base-stackings
print()
print("STACKING")
for p in range(len(stackings[0][0])):
    res1 = res[stackings[0][0][p][0]]
    res2 = res[stackings[0][0][p][1]]
    interaction =  stackings[0][1][p]
    print("%10s %10s %4s" % (res1,res2,interaction))


BASE-PAIRS
     C_6_0     G_24_0  WCc
     U_7_0     C_23_0  WHc
     C_8_0     C_22_0  SHt
     A_9_0     A_21_0  HHt
    G_10_0     U_11_0  SHc
    U_11_0     A_20_0  WHt
    A_12_0     G_19_0  HSc
    U_13_0     A_18_0  WCc

STACKING
     C_6_0      U_7_0   >>
     U_7_0      C_8_0   >>
     C_8_0      A_9_0   ><
    A_12_0     U_13_0   >>
    A_12_0     A_20_0   <>
    A_18_0     G_19_0   >>
    A_20_0     A_21_0   >>
    A_21_0     C_22_0   >>
    C_22_0     C_23_0   >>
# Loading ../test/data/SARCIN.pdb 

Decypher the annotation

Base-pairing are classified according to the Leontis-Westhof classification, where

  • W = Watson-Crick edge
  • H = Hoogsteeen edge
  • S= Sugar edge
  • c/t = cis/trans
  • XXx = when two bases are close in space, but they do not fall in any of the categories. This happens frequently for low-resolution structures or from molecular simulations.

WWc pairs between complementary bases are called WCc or GUc.
Stacking are classified according to the MCannotate classification:

  • ">>" Upward
  • "<<" Downward
  • "<>" Outward
  • "><" Inward

Criteria for stacking/pairing

First, we consider only bases that are "close" in space, i.e. $R_{ij} < 1.7$ and $R_{ji} < 1.7$.
$R_{ij} = (x_{ij}/5, y_{ij}/5, z_{ij}/3)$ is the SCALED position vector with components ${x,y,z}$ (in $\mathring{A}$) of base j constructed on base i.
The criteria for base-stacking are the following:

$( |z_{ij}| \; AND \; |z_{ji}| > 2 \mathring{A} ) \; AND \; (\rho_{ij} \; OR\; \rho_{ji} < 2.5 \mathring{A}) \; AND\; (|\theta_{ij}| < 40^{\circ} ) $

where

  • $ \rho_{ij} = \sqrt{x_{ij}^2 + y_{ij}^2} $
  • $\theta_{ij}$ = angle between the vectors normal to the base plane

The criteria for base-pairing are the following:

non stacked AND $|\theta_{ij}| < 60^{\circ}$ AND (number of hydrogen bonds $> 0$)
The number of hydrogen bonds is calculated as the number of donor-acceptor pairs with distance $< 3.3 \mathring{A}$. If bases are complementary and the number of hydrogen bonds is > 1 (AU/GU) or > 2 (GC), the pair is considered WCc (or GUc).

  • cis/trans is calculated according to the value of the dihedral angle defined by $C1'_{i}-N1/N9_{i}-N1/N9_{j}-C1'_{j}$
  • edges are definded according to the value of $\psi = \arctan{(\hat{y}_{ij}/\hat{x}_{ij})}$.
    1. Watson-Crick edge: $0.16 <\psi \le 2.0 rad$
    2. Hoogsteen edge: $2.0 <\psi \le 4.0 rad $.
    3. Sugar edge: $\psi > 4.0, \psi \le 0.16$

ATT!

  • These criteria are slightly different from the one used in other popular software for annotating three-dimensional structures (e.g. X3DNA, MCAnnotate, Fr3D, etc.). From my experience, all these packages give slightly different results, especially for non-Watson-Crick base-pairs.
  • Stacking is also problematic, as it relies on arbitrary criteria.
  • In all cases, criteria for stacking and pairing were calibrated to work well for high resolution structures. These criteria might not be optimal for low-resolution structures and to describe nearly-formed interactions such the ones that are often encountered in molecular simulations.

Dot-bracket annotation

From the list of base-pairing, we can obtain the dot-bracket annotation using the function

dotbracket = bb.dot_bracket(pairings,res)

this function returns a string for each frame in the PDB/simulation. Let's see this in action:


In [3]:
dotbr, seq = bb.dot_bracket(pairings,res)
print(">",seq)
for j in range(len(dotbr)):
    print(dotbr[j])


> CUCAGUAUAGAACCG

(......().....)